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processes and their applications. 
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science, mathematics, and management. The background required to study 
the book is one year of calculus, elementary differential equations, matrix 
analysis, and some signal and system theory, including Fourier transforms. 
The book can be used as a self-contained textbook or for self-study. Each 
topic 1s introduced in a chapter with numerous solved problems. The solved 
problems constitute an integral part of the text. 

This new edition includes and expands the contents of the first edition. 
In addition to refinement through the text, two new sections on probability- 
generating functions and martingales have been added and a new chapter on 
information theory has been added. 
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advantages of both the textbook and the so-called review book. It provides 
the textual explanations of the textbook, and in the direct way characteristic 
of the review book, it gives hundreds of completely solved problems that 
use essential theory and techniques. Moreover, the solved problems are an 
integral part of the text. The background required to study the book is one 
year of calculus, elementary differential equations, matrix analysis, and 
some signal and system theory, including Fourier transforms. 

I wish to thank Dr. Gordon Silverman for his invaluable suggestions and 
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CHAPTER 1 


Probability 


1.1 Introduction 


The study of probability stems from the analysis of certain games of chance, and it 
has found applications in most branches of science and engineering. In this chapter 
the basic concepts of probability theory are presented. 


1.2 Sample Space and Events 


A. Random Experiments: 


In the study of probability, any process of observation is referred to as an 
experiment. The results of an observation are called the outcomes of the 
experiment. An experiment is called a random experiment if its outcome cannot be 
predicted. Typical examples of a random experiment are the roll of a die, the toss 
of a coin, drawing a card from a deck, or selecting a message signal for 
transmission from several messages. 


B. Sample Space: 


The set of all possible outcomes of a random experiment is called the sample space 
(or universal set), and it is denoted by S. An element in S is called a sample point. 
Each outcome of a random experiment corresponds to a sample point. 


EXAMPLE 1.1 Find the sample space for the experiment of tossing a coin (a) 
once and (b) twice. 


(a) There are two possible outcomes, heads or tails. Thus: 


S = {H, T} 


where H and T represent head and tail, respectively. 


(b) There are four possible outcomes. They are pairs of heads and tails. Thus: 
S = {HH, HT, TH, TT} 


EXAMPLE 1.2 Find the sample space for the experiment of tossing a coin 
repeatedly and of counting the number of tosses required until the first head 
appears. 


Clearly all possible outcomes for this experiment are the terms of the sequence 
1, 2,3, ... Thus: 


S= {1,2,3,...} 
Note that there are an infinite number of outcomes. 


EXAMPLE 1.3 Find the sample space for the experiment of measuring (in hours) 
the lifetime of a transistor. 


Clearly all possible outcomes are all nonnegative real numbers. That is, 
S={t:051tS ~} 


where 7 represents the life of a transistor in hours. 

Note that any particular experiment can often have many different sample 
spaces depending on the observation of interest (Probs. 1.1 and 1.2). A sample 
space S is said to be discrete if it consists of a finite number of sample points (as in 
Example 1.1) or countably infinite sample points (as in Example 1.2). A set is 
called countable if its elements can be placed in a one-to-one correspondence with 
the positive integers. A sample space S is said to be continuous if the sample points 
constitute a continuum (as in Example 1.3). 


C. Events: 


Since we have identified a sample space S as the set of all possible outcomes of a 
random experiment, we will review some set notations in the following. 
If ¢is an element of S (or belongs to S), then we write 


CES 


If S is not an element of S (or does not belong to S), then we write 
CES 

A set A is called a subset of B, denoted by 
ACB 


if every element of A is also an element of B. Any subset of the sample space S is 
called an event. A sample point of S is often referred to as an elementary event. 
Note that the sample space S is the subset of itself: that is, S C S. Since S is the set 
of all possible outcomes, it is often called the certain event. 


EXAMPLE 1.4 Consider the experiment of Example 1.2. Let A be the event that 
the number of tosses required until the first head appears is even. Let B be the 
event that the number of tosses required until the first head appears is odd. Let C 
be the event that the number of tosses required until the first head appears is less 
than 5. Express events A, B, and C. 

A = {2,4,6,...} 

B= {1,3,5,...} 

C= {1,2,3,4} 


1.3 Algebra of Sets 


A. Set Operations: 


1. Equality: 
Two sets A and B are equal, denoted A = B, if and only if A C Band BC A. 


2. Complementation: 


Suppose 4 C S. The complement of set A, denoted A, is the set containing all 
elements in S but not in A. 


A = {C: CE Sande ¢A} 


3. Union: 


The union of sets A and B, denoted A U B, is the set containing all elements in 
either A or B or both. 


AUB= {t:CEAorcE B} 


4. Intersection: 


The intersection of sets A and B, denoted A / B, is the set containing all elements 
in both A and B. 


ANB={T:C EA ande € B} 


5. Difference: 


The difference of sets A and B, denoted A\ B, is the set containing all elements in A 
but not in B. 


A\B = {€: CEA andc € B} 


Note that 4\B = ANB: 


6. Symmetrical Difference: 


The symmetrical difference of sets A and B, denoted A A B, is the set of all 
elements that are in A or B but not in both. 


AAB={C:CEAorCEBandt EAN B} 
Note that A A B = (AM B) U (AM B) = (A\B) U (B\ A). 


7. Null Set: 
The set containing no element is called the nu// set, denoted ©. Note that 


Q@=S8 


8. Disjoint Sets: 
Two sets A and B are called disjoint or mutually exclusive if they contain no 
common element, that is, if 4 1 B=@. 

The definitions of the union and intersection of two sets can be extended to any 
finite number of sets as follows: 


U4; =A, UA, U--UA, 


i=! 


={€: CEA, orf GA, or:-€EA,} 


(A; =A, NA, 9-0 A, 
i=l 


={é: € EA, and CEA, and ---¢ € A, } 


Note that these definitions can be extended to an infinite number of sets: 


JA, 4, UA, UA, U- 


i=] 
()4,=4,9 4,94, 9+: 
i=1 


In our definition of event, we state that every subset of S is an event, including S 
and the null set @ Then 
S = the certain event 


© = the impossible event 


If A and B are events in S, then 


A = the event that A did not occur 
A U B = the event that either A or B or both occurred 
A‘) B = the event that both A and B occurred 


Similarly, if A;, A>, ..., A, are a sequence of events in S, then 


a 

U A; = the event that at least one of the A; occurred 
i=l 

a 

()A; = the event that all of the A; occurred 


i=l 


9. Partition of S: 


k 
If A; A; =@ fori #7 and U A, = S, then the collection {4;, 1 <i <k} is said to 
i=l 
form a partition of S. 


10. Size of Set: 


When sets are countable, the size (or cardinality) of set A, denoted |A], is the 
number of elements contained in A. When sets have a finite number of elements, it 
is easy to see that size has the following properties: 


(() LAMB=@,then |A U B| = |A| + |B. 
(ii) |@| =0. 

(ii) WACB, then |A| = |B]. 

(iv) |AUB|+ |ANB| = |A| + |B. 


Note that the property (iv) can be easily seen if A and B are subsets of a line with 
length |A| and |B|, respectively. 


11. Product of Sets: 


The product (or Cartesian product) of sets A and B, denoted by A x B, is the set of 
ordered pairs of elements from A and B. 


C=Ax B= {(a,b):a€A,b € B} 
Note that A x B#B x A, and |C| =|A x B| =|A| x |B]. 
EXAMPLE 1.5 Let A = {a), a5, a3} and B= {b,, by}. Then 


CH= AX B= (Gy. 8) (Gs bps Cer Bs lay 8) ys Bi Gye BP 
D- RBxXA~ {(h,, a,),(8,, a), (G,, a5), (,, G,), (2, 4), (85, as) 


B. Venn Diagram: 


A graphical representation that is very useful for illustrating set operation is the 
Venn diagram. For instance, in the three Venn diagrams shown in Fig. 1-1, the 
shaded areas represent, respectively, the events A U B, AN B, and 4. The Venn 


diagram in Fig. 1-2(a) indicates that B C A, and the event A M B = A\B is shown as 
the shaded area. In Fig. 1-2(b), the shaded area represents the event A A B. 


S S 
A ‘B A ‘B 
) 


(a) Shaded region: A UB (b) Shaded region:A NB 


Fig. 1-1 


a)B CA (b) Shaded area: A AB 
Shaded area: AN B=A\B 


Fig. 1-2 


C. Identities: 


By the above set definitions or reference to Fig. 1-1, we obtain the following 
identities: 


S=2 

G@=S$ 

A=A 
TA SS 
Shao 
AtLA= 8 
ANA=@G 
AUG=A 
ANG=2 
A\B=ANB 
S\A=A 
A\M=A 


AAB=(ANB)UANB) 


The union and intersection operations also satisfy the following laws: 


Commutative Laws: 
GS ee AS Be 
ANB=BNA 
Associative Laws: 
AU(BUC) -(AUBUC 
AN(BNC) =ANBNC 
Distributive Laws: 
AN(BUQ- (ANBU(ANC) 
AU(BNC)=(AUBNAUC) 


De Morgan’s Laws: 


(1.1) 
(1.2) 
(1,3) 
(1.4) 
(1.5) 
(1.6) 
(1.7) 
(1.8) 
(1.9) 
(1.10) 
eee, 
(1.12) 
(1.13) 


(1.14) 


(1.15) 


(1.16) 
(1.17) 


(1.18) 
(1.19) 


ANB-AUB (121) 


These relations are verified by showing that any element that is contained in the set 
on the left side of he equality sign is also contained in the set on the right side, and 
vice versa. One way of showing this is by means of a Venn diagram (Prob. 1.14). 
The distributive laws can be extended as follows: 


f ow 


A | U a |= LJoa ane (1.22) 
i=! 


I=! 
fon \ fi 
AU | (\BJ=[ (AUB) (1.23) 
\i-l in] 
Similarly, De Morgan’s laws also can be extended as follows (Prob. 1.21): 


4] _ () Ai (1 24) 
r=! 


Ua|-Aa (1.25) 


1.4 Probability Space 
A. Event Space: 


We have defined that events are subsets of the sample space S. In order to be 
precise, we say that a subset A of S can be an event if it belongs to a collection F of 
subsets of S, satisfying the following conditions: 


(i Sk (1.38) 
ii if 4 = F, then ACh : aan 
i ita, os forse] then e 1 T (1,38) 


The collection F is called an event space. In mathematical literature, event space is 
known as sigma field (o-field) or o-algebra. 
Using the above conditions, we can show that if A and B are in F, then so are 


AB, A\B, A A B (Prob. 1.22). 


EXAMPLE 1.6 Consider the experiment of tossing a coin once in Example 1.1. 
We have S= {H, T}. The set {S, O}, {S, @, H, T} are event spaces, but {$, O, H} 
is not an event space, since 7 = T is not in the set. 


B. Probability Space: 


An assignment of real numbers to the events defined in an event space F' is known 
as the probability measure P. Consider a random experiment with a sample space 
S, and let A be a particular event defined in F. The probability of the event A is 
denoted by P(A). Thus, the probability measure is a function defined over F’. The 
triplet (S, F, P) is known as the probability space. 


C. Probability Measure 


a. Classical Definition: 


Consider an experiment with equally likely finite outcomes. Then the classical 
definition of probability of event A, denoted P(A), is defined by 


P(A =| (1.29) 


If A and B are disjoint, 1.e., 4 M B =, then, |A U B| = |A| + |B]. Hence, in this case 


[Aural falta] fal [Rl a 
PLAU B) t= PA) + PLR) (1.30) 
|S s|  |s| [S| 


We also have 


PSS =1 (1.31) 


vei AP lps Fl ope P(A) (1.32) 
TS 


EXAMPLE 1.7 Consider an experiment of rolling a die. The outcome is 


Define: 
A: the event that outcome is even, i.e., A = {2, 4, 6} 
B: the event that outcome is odd, i.¢e., B = {1, 3, 5} 
C: the event that outcome is prime, i.e., C = {1, 2, 3, 5} 


Then 
Ce el a 
| 5] 6 2 [5 6 2 |S 6 3 


Note that in the classical definition, P(A) is determined a priori without actual 
experimentation and the definition can be applied only to a limited class of 
problems such as only if the outcomes are finite and equally likely or equally 
probable. 


b. Relative Frequency Definition: 


Suppose that the random experiment is repeated n times. If event A occurs n(A) 
times, then the probability of event A, denoted P(A), is defined as 


P(A) = tim 2 


n> fl 


(1.33) 


\ 


where n(A)/n is called the relative frequency of event A. Note that this limit may 
not exist, and in addition, there are many situations in which the concepts of 
repeatability may not be valid. It is clear that for any event A, the relative 
frequency of A will have the following properties: 


1.0 <n(A)/n < 1, where n(A)/n = 0 1f A occurs in none of the 7 repeated trials 
and n(A)/n = 1 if A occurs in all of the 1 repeated trials. 


2. If A and B are mutually exclusive events, then 
n(A U B) = n(A) + n(B) 


and 


nA fy ala 4 NCB) 


tt ff fi 14 
AUB A N(B) 
pe may anne ee fh PR preg YO as ctme 
dst fT novxk no iT 


c. Axiomatic Definition: 

Consider a probability space (S, F, P). Let A be an event in F. Then in the 
axiomatic definition, the probability P(A) of the event A is a real number assigned 
to A which satisfies the following three axioms: 


Asim |: Piast (1) 35% 
Aawm 2; Fis} = | ().36i 
Axiom: Pid = Pt A PATH (ae 


If the sample space S is not finite, then axiom 3 must be modified as follows: 
Axiom 3’: If 41, A>, ... is an infinite sequence of mutually exclusive events in S 
(4; 4;=@ fori #/), then 


P 


fon i ~ 
; i=] 


em 


These axioms satisfy our intuitive notion of probability measure obtained from the 
notion of relative frequency. 


d. Elementary Properties of Probability: 


By using the above axioms, the following useful properties of probability can be 
obtained: 


(|, Pidy= 1 — Pa} 39 


2. He 0 ie 
i} PADS PIB) | HACC (11) 
4. PANS | (142i 
6 PAU Ri= Balt ARR (113) 
6. PAM) = Pu) - PA (14d 


7. TEA, AS oA, are at arbitrary events und, then 


php i144 
anil IPA MA: TeV, 


where the sum of the second term is over all distinct pairs of events, that of 
the third term is over all distinct triples of events, and so forth. 


8. If A}, Ap, ..., A, 18 a finite sequence of mutually exclusive events in S (4; 1 A; 
= @ fori #/), then 


f ; a 
P| JA, | = P(A;) (1.46) 
pee] i=] 


and a similar equality holds for any subcollection of the events. 
Note that property 4 can be easily derived from axiom 2 and property 3. 
Since A C S, we have 


P(A) < P(S) =1 


Thus, combining with axiom 1, we obtain 
Os P(A)= 1 (1.47) 
Property 5 implies that 
P(A U B) = P(A) + PCB) (1.48) 


since P(A M B) >= 0 by axiom 1. 


Property 6 implies that 
P(A\ B) = P(A) - PB) ifBCA (1.49) 


sinceAN B=BifBCA. 


1.5 Equally Likely Events 
A. Finite Sample Space: 


Consider a finite sample space S with n finite elements 
ee i een 


where ¢;’s are elementary events. Let P(¢;) = p;. Then 


Upitet 150) 


|. We | i=l, 


i i Bel pe bee ol (N41 


Lea | tv G. where J is a callection of subscripts, then 


-d a 
oi 


P(A)= FPG) = ¥ pi (1.52) 


GCA iC! 


B. Equally Likely Events: 
When all elementary events ¢(7 = 1, 2, ..., 2) are equally likely, that is, 
P; = Py ~~ =n 


then from Eq. (1.51), we have 


and 


p(Ay= 2 (1.54) 
it 


where n(A) is the number of outcomes belonging to event A and n is the number of 
sample points in S. [See classical definition (1.29). ] 
1.6 Conditional Probability 


A. Definition: 


The conditional probability of an event A given event B, denoted by P(A | B), is 
defined as 


P(ANB) 
P(B) 


P(A|B)= 


P(B)>0 (1.55) 


where P(A / B) is the joint probability of A and B. Similarly, 
P(AMB) 


P(BIA)— P(A) > 0 (1.56) 


is the conditional probability of an event B given event A. From Eqs. (1.55) and 
(1.56), we have 


P(A 1 B) = P(A 


B) P(R) = P(B\A)P(A) (LS7) 


Equation (1.57) is often quite useful in computing the joint probability of events. 


B. Bayes’ Rule: 


From Eq. (1.57) we can obtain the following Bayes’ rule: 


P(B|A)P(A) 


P(A|B)= 
AB om 


(1.58) 


1.7 Total Probability 


The events A,, A>, ...,. A, are called mutually exclusive and exhaustive if 


J APTA UVAM UA WS and ANAT TF) (139) 
i-| 
Let B be any event in S. Then 


P(8) = s P(BNA;) = 5 P(B |A, )P(A;) (1.60) 
i=l i=! 
which is known as the total probability of event B (Prob. 1.57). Let A = A; in Eq. 
(1.58); then, using Eq. (1.60), we obtain 
P( BIA; P(A; 
P(A,|B) = | is ( i) 


? 


» P(B 


i—] 


A,)PCA;) (1.61) 


Note that the terms on the right-hand side are all conditioned on events A;, while 
the term on the left is conditioned on B. Equation (1.61) is sometimes referred to as 


Bayes’ theorem. 
1.8 Independent Events 
Two events A and B are said to be (statistically) independent if and only if 
P(A 1M B) = P(A)P(B) (1.62) 


It follows immediately that if A and B are independent, then by Eqs. (1.55) and 
(1.56), 


P(A | B) = P(A) and PCB | A)= PCB) (1.63) 


If two events A and B are independent, then it can be shown that A and 8 are also 
independent; that is (Prob. 1.63), 


P(A MB) ~~ P(A)P(R) (1.64) 
Then 


— PAM B) 
P(A| B) = ———— = P(A) 1.65) 
| P(B) 


Thus, if A is independent of B, then the probability of A‘s occurrence is unchanged 
by information as to whether or not B has occurred. Three events A, B, C are said 
to be independent if and only if 


P(AN BOC) P(A)P(B)P(C) 
PLAN BY = P(APUB) 
P(ANC)= P(AVPC) (1,66) 
P(BINC)=P(B)P(C) 
We may also extend the definition of independence to more than three events. The 


events A,, A>, ..., A, are independent if and only if for every subset {4;,, 4), ... 
A;} (2<k <n) of these events, 


PiA,, 1A 1 1A) = pl, PA,,)  PA,) (1,67) 


Finally, we define an infinite set of events to be independent if and only if every 
finite subset of these events is independent. 

To distinguish between the mutual exclusiveness (or disjointness) and 
independence of a collection of events, we summarize as follows: 


1. {A4,, i= 1, 2, ..., n} 1s a sequence of mutually exclusive events, then 


Us 
i=l 


P 


n 
- > P(A.) (1.68) 
eo | 
i=] 
2. If {A,, i= 1,2, ...,} 1s a sequence of independent events, then 


f ; fv 
PI()A, | = [Pap (1.69) 
i=] 


i=l 


and a similar equality holds for any subcollection of the events. 


SOLVED PROBLEMS 


Sample Space and Events 


1.1. 


1.2. 


Consider a random experiment of tossing a coin three times. 

(a) Find the sample space S; if we wish to observe the exact sequences of 
heads and tails obtained. 

(b) Find the sample space S, if we wish to observe the number of heads in 
the three tosses. 


a e sampling space S, is given 
( ) Th pl g sp S; g by 
S:={HHH, HUT, ATH; THA, HTT, THT,:TTH, TTT} 


where, for example, HTH indicates a head on the first and third throws 
and a tail on the second throw. There are eight sample points in S$}. 


(b) The sampling space S, is given by 
S, = 40,1, 2,3} 


where, for example, the outcome 2 indicates that two heads were 
obtained in the three tosses. The sample space S, contains four sample 


points. 


Consider an experiment of drawing two cards at random from a bag 
containing four cards marked with the integers | through 4. 


(a) Find the sample space S| of the experiment if the first card is replaced 
before the second is drawn. 

(5) Find the sample space S, of the experiment if the first card is not 
replaced. 


(a) The sample space S, contains 16 ordered pairs (i, 7), 1<i<4,1<j <4, 
where the first number indicates the first number drawn. Thus, 


(1) (1,2) 43) (4) 
JQ) @2) 2,3) 2,4) 


1") @,1) 6,2) 6,3) G4) 
(4,1) (4,2) (4,3) (4,4) 


1.3. 


1.4. 


(b) The sample space S, contains 12 ordered pairs (7, /), i#/, 1 <i<4,1<j 
<4, where the first number indicates the first number drawn. Thus, 


(1,2) (1,3) (,4) 
Ci) C33) @.4) 
2~1(3,1) (3,2) (3,4) 
(4,1) (4,2) (4,3) 


An experiment consists of rolling a die until a 6 is obtained. 
(a) Find the sample space S; if we are interested in all possibilities. 


(5) Find the sample space S, if we are interested in the number of throws 
needed to get a 6. 


(a) The sample space S, would be 


Ay = {6, 
16, 26, 36, 46, 56, 
116,126, 136,146, 156, ...} 


where the first line indicates that a 6 is obtained in one throw, the second 
line indicates that a 6 is obtained in two throws, and so forth. 


(b) In this case, the sample space S, is 
§ tS 11,99) 


where 7 is an integer representing the number of throws needed to get a 
6. 


Find the sample space for the experiment consisting of measurement of the 
voltage output v from a transducer, the maximum and minimum of which are 
+ 5 and — 5 volts, respectively. 


A suitable sample space for this experiment would be 


S={v: -5 Ssvs5} 


1.5. An experiment consists of tossing two dice. 
(a) Find the sample space S. 
(b) Find the event A that the sum of the dots on the dice equals 7. 
(c) Find the event B that the sum of the dots on the dice is greater than 10. 
(d) Find the event C that the sum of the dots on the dice is greater than 12. 


(a) For this experiment, the sample space S consists of 36 points (Fig. 1-3): 


S={(i,): i, j=1,2,3,4, 5, 6} 


. 


Ze 1, B=, 6) (3,6) (4,6) <6, 6) 6,6 

1, 5)- -12, 5} -(3,5) (4,5) 5, 5)>- ~(6, 5)! 

2, 4.8, I-44) 6,4) 6,4) 
, 3) (3, 3)- 4,3)° 6,3) 6,3 


(2 ) 
(2,2) (3,2) (4,2) _ 2 ba 
( Sie 


2,1) (3,1) (4,1) a) 


Fig. 1-3 


where i represents the number of dots appearing on one die and j 
represents the number of dots appearing on the other die. 


(b) The event A consists of 6 points (see Fig. 1-3): 
A = {(1,6), (2,5), 3, 9,4, 3), G6. 2); ©; 1} 
(c) The event B consists of 3 points (see Fig. 1-3): 
B= {(5, 6), (6,5), (6, 6)} 
(d) The event C is an impossible event, that is, C = @. 


1.6. An automobile dealer offers vehicles with the following options: 
(a) With or without automatic transmission 


LL. 


1.8. 


(b) With or without air-conditioning 

(c) With one of two choices of a stereo system 

(d) With one of three exterior colors 

If the sample space consists of the set of all possible vehicle types, what is 
the number of outcomes in the sample space? 


The tree diagram for the different types of vehicles is shown in Fig. 1-4. 
From Fig. 1-4 we see that the number of sample points in Sis 2 x 2x2 x3= 
24. 


Transmission Automatic Manual 
Air-conditioning Yes No Yes No 
Stereo 1 2 1 2 1 2 1 2 
Color 

Fig. 1-4 


State every possible event in the sample space S' = {a, b, c, d}. 


There are 2*= 16 possible events in S. They are @; {a}, {b}, {c}, {d}; {a, b}, 
{a,c}, {a, d}, {b, c}, {b, d}, {c, d}; {a, 5, c}, ta, b, d}, {a,c, d}, {b, c,d}; S 
= {a, b, c, d}. 


How many events are there in a sample space S with n elementary events? 


Let S= {s1, 59, ..., S,}. Let O be the family of all subsets of S. (Q is 
sometimes referred to as the power set of S.) Let S; be the set consisting of 


two statements, that is, 


S.= { Yes, the s.is in; No, the s. is not in} 
i t t 


Then Q can be represented as the Cartesian product 


Q=S, XS, XX S$, 
= {(5,,55, --.,5,):8; © S; fori =1,2, ...,n} 


Since each subset of S can be uniquely characterized by an element in the 
above Cartesian product, we obtain the number of elements in QO by 


n(Q) = n(S,)n(S,)---n(S,) = 2” 
where n(S;) = number of elements in S; = 2. 


An alternative way of finding n(Q) is by the following summation: 
n n n n ! 
n(Q) = = ———_ 
a » "| 2 i!(n—i)! 
The last sum is an expansion of (1+1)” = 2”. 
Algebra of Sets 
1.9. Consider the experiment of Example 1.2. We define the events 


A= {k: k is odd} 
B=fk:4sks7 
C={k:1sk <10} 
where k is the number of tosses required until the first H (head) appears. 


Determine the events 4, B,C, AUB, BUC,ANB,ANC,BNC, and 4n 
B. 


A ={k: k is even} = {2, 4, 6,...} 
B={k: k=1,2, Jor k= 3} 
C=ik: k=11} 

AU B={k: kis odd or k =4, 6} 

BUC=C 

AN B={5, 7} 

ANC={1, 3, 5, 7, 9} 

BNC=B 

AN B={A4, 6} 


1.10. Consider the experiment of Example 1.7 of rolling a die. Express 
AUB, AN C, B.C. BNC, C\B, BAC 


From Example 1.7, we have S= {1, 2, 3, 4, 5, 6}, A = {2, 4, 6}, B= {1, 3, 
5}, and C= {1, 2, 3, 5}. 
Then 


AUS 112,44) 95 0) 8; Attar; B— (2,4, 6} - <A, C — {4.6} 
RIC RNG {), C\R COR 02}, 
BAC —(B\C)U(C\B) — {2} U 12} — (9) 
1.11. The sample space of an experiment is the real line express as 


S = {vi -2<y < of 


(a) Consider the events 


Determine the events 


(a) It is clear that 


UA ={v: 0=v<}} 
i=1 


Noting that the A,’s are mutually exclusive, we have 


i=| 


1.12. Consider the switching networks shown in Fig. 1-5. Let A), A>, and A 
denote the events that the switches s,, 55, and s3 are closed, respectively. Let 
A,» denote the event that there is a closed path between terminals a and b. 

Express A, in terms of 4,, A>, and A; for each of the networks shown. 


Fig. 1-5 
(a) From Fig. 1-5(a), we see that there is a closed path between a and b only 
if all switches s1, 55, and s3 are closed. Thus, 
A, =4,NA,NA, 
(b) From Fig. 1-5(b), we see that there is a closed path between a and 5 if at 


least one switch is closed. Thus, 


A, =4,UA,UA, 


(c) From Fig. 1-5(c), we see that there is a closed path between a and b if s, 
and either s» or s3 are closed. Thus, 


A,=A, fA, VA,) 


Using the distributive law (1.18), we have 


A= (A, MA) UGC, I1A,) 


which indicates that there is a closed path between a and b ifs; and sy or 
s, and s3 are closed. 


(d) From Fig. 1-5(d), we see that there is a closed path between a and b if 
either s, and s, are closed or s3 is closed. Thus 


A, =(A,NA4,)UA, 
1.13. Verify the distributive law (1.18). 
Lets € [4M (BU CO). Then s € A and s € (B U C). This means either that s 


€ Aands € Bor thats € A ands € C; thatis,s €(4 1 B)ors E(ANC). 
Therefore, 


AN (BUC) CiMNB) UATE] 


Next, lets € [((4N B) U(AN OC). Thens € A ands € Bors EA ands EC. 
Thus, s € A and (s € B ors € C). Thus, 


[(ANBJUANC)CAN(BUC) 
Thus, by the definition of equality, we have 
AN(BUC) =(ANB)U(ANC) 
1.14. Using a Venn diagram, repeat Prob. 1.13. 


Fig. 1-6 shows the sequence of relevant Venn diagrams. Comparing Fig. 1- 
6(b) and 1.6(e), we conclude that 


AN(BUC)=(ANB)UANC) 


(a) Shaded region: B UC (b) Shaded region: AM (B U C) 
A B 
Cc 
(c) Shaded region: A NB (a) Shaded region: AM C 


(e) Shaded region: (AN B) U (ANC) 
Fig. 1-6 


1.15 Verify De Morgan’s law (1.24) 


ANB=AUB 


Suppose that, Ang, thens €ANB.Sos € {both A and B}. This means 
that either s € A ors € Bors € A ands € B. This implies that ;E 4 UB. 
Conversely, suppose that ; € 4 UL B, that is either ;E 4 ors E B or s € {both 
A and Bp}. Then it follows that s € A ors € Bors € {both A and B}; that is, s 
€AMBorse Ans: lhus, we conclude that 4q B= A UB: 


Note that De Morgan’s law can also be shown by using Venn diagram. 


1.16. Let A and B be arbitrary events. Show that A C Bif and only if AN B=A. 


LY/. 


1.18. 


“Tf” part: We show that if 4 M B =A, then A C B. Lets € A. Then s € (AN 
B), since A = A 1 B. Then by the definition of intersection, s € B. Therefore, 
ADB. 


“Only if” part: We show that if A C B, then A MN B= A. Note that from the 
definition of the intersection, (A = B) C A. Suppose s € A. If A C B, thens € 
B.Sos €A ands € B; that is, s €(A N B). Therefore, it follows that A c (4 N 
B). Hence, A = A M B. This completes the proof. 


Let A be an arbitrary event in S and let © be the null event. Verify Eqs. (1.8) 
and (1.9), i.e. 

i AJ®=A ‘Lai 
(ph AVE 2 ai 


(a) AUOG={s:sEAorsEO} 
But, by definition, there are no s € @. Thus, 


AU® = {s:sEA}=A 


(b) AN @ {s:s €A ands € O} 


But, since there are no s € Q, there cannot be an s such that s € A ands € 
©. Thus, 


ANGD=DH 


Note that Eq. (1.9) shows that @ is mutually exclusive with every other 
event and including with itself. 


Show that the null (or empty) set @ is a subset of every set A. 


From the definition of intersection, it follows that 


(AN BICA and (AN BCB (1.70) 


4 


for any pair of events, whether they are mutually exclusive or not. If A and B 
are mutually exclusive events, that is, A M B =@, then by Eq. (1.70) we 
obtain 


QCA and GB CT FY) 


Therefore, for any event A, 


OCA 


that is, O is a subset of every set A. 


1.19. Show that A and B are disjoint if and only if A\B = A. 


First, if A\ B=AN B =A, then 


ANB=(ANB)NB=AN(BNB)=ANG=B 


and A and B are disjoint. 
Next, if A and B are disjoint, then AM B=@, andA\B=AN B=A. 
Thus, A and B are disjoint if and only if A\B = A. 


1.20. Show that there is a distribution law also for difference; that is, 


(KB) NC=(ANC)A\BNC) 


(1.72) 


By Eq. (1.8) and applying commutative and associated laws, we have 


(AB) NC =(ANBYNC=AN(BNC)=(ANC)NB 


Next, 


WMCW BIC 


AMC BAC) 
(AN CHUBUC) 


=(ANCINBILLANC)NC 
=[(ANCINBIUIAN(CNC) 
-[ANCINB)U[AN] 
—-[(ANCYVBJUS 


-—(ANCINB 


Thus, we have 


(AVB) NC =(ANC)\(BNC) 


hy Eq. (1.10) 
hy Ey. {1.21} 
by Eq. (1.19) 
by Eq. (1.17) 
by Eq. (1.7) 
by Eq. (1.9) 
by Eq. (1.8) 


1.21. Verify Eqs. (1.24) and (1.25). 


(a) Suppose first that 5 € 


i=l 


Us} then s € 


i=1 


That is, ifs is not contained in any of the events A,,i= 1, 2,...,, then s 
is contained in A, for all i= 1, 2, ..., n. Thus 


n 
sE()A, 
i=l 
Next, we assume that 
n _ 
sE()A, 
i=1 


Then s is contained in A ; for all i= 1, 2, ..., 2, which means that s is not 
contained in A, for any 7 = 1, 2, ..., m, implying that 


sE LA, 
i=l 


Thus, 


This proves Eq. (1.24). 


(b) Using Eqs. (1.24) and (1.3), we have 


Taking complements of both sides of the above yields 


ue [Os 


=I 
which is Eq. (1.25). 
Probability Space 


1.22. Consider a probability space (S, F, P). Show that if A and B are in an event 
space (o-field) F, so are A M B, A\B, and A A B. 


By condition (ii) of F, Eq. (1.27), if A, B € F, then 4, B € F. Now by De 
Morgan’s law (1.21), we have 


ANB-AUBEF by Eq. (1.28) 
ANB=ANBEF by Eq. (1.27) 
Similarly, we see that 
ANBEF and ANBEF 
Now by Eq. (1.10), we have 
A\B=ANBEF 
Finally, by Eq. (1.13), and Eq. (1.28), we see that 


AAB=(ANB)U(ANB)EF 


1.23. Consider the experiment of Example 1.7 of rolling a die. Show that {S, , A, 
B} are event spaces but {S, O, A} and {S, O, A, B, C} are not event spaces. 


Let F = {S, @, A, B}. Then we see that 
SEF,S=OBEF,O@ =SEF,A=BEF,B=AEF 


and 


SUM H=SUA=SU F= SEF; OVUAS=AUO=AEF, 
OQIB=BUO=BEF, AUB=BUA=SEF 
Thus, we conclude that {S, @, A, B} is an event space (o-field). 
Next, let F = {S, 0, A}. Now A = BE F. Thus {S, QO, A} is not an event 
space. Finally, let F = {S, O, A, B, C}, but C = {2,6} € F. Hence, {S, O, A, 
B, C} is not an event space. 
1.24. Using the axioms of probability, prove Eq. (1.39). 
We have 
S—AUA and ANA=© 
Thus, by axioms 2 and 3, it follows that 
P(S) = 1 = P(A) + P(A) 
from which we obtain 
P(A) = 1 — P(A) 
1.25. Verify Eq. (1.40). 
From Eq. (1.39), we have 
P(A) = 1 — P(A) 
Let A = ©. Then, by Eq. (1.2), 4 =@ = S, and by axiom 2 we obtain 
P(@) =1-P(S)=1-1=0 
1.26. Verify Eq. (1.41). 


Let A C B. Then from the Venn diagram shown in Fig. 1-7, we see that 


Shaded region: A N B 


Fig. 1-7 
B-AU{ANB) and AN(ANAY-G (| 
Hence, from axiom 3, 
P(B) = P(A) + P(A MB) 
However, by axiom 1, P(A M B) = 0. Thus, we conclude that 
P(A) S P(B) ifACB 
1.27. Verify Eq. (1.43). 


From the Venn diagram of Fig. 1-8, each of the sets A € B and B can be 


74) 


represented, respectively, as a union of mutually exclusive sets as follows: 


AUB=AU(ANB) and B=(ANB)U(ANB) 


S 
‘CD 
Shaded region: AM B Shaded region: A 1B 
Fig. 1-8 
Thus, by axiom 3, 
P(A UB) — P(A) — P(A B) (1 


and 


75) 


P(B) — P(AN B)+ P(A B) (1.76) 
From Eq. (1.76), we have 
P(A 1B) = P(B) — P(AMB) (1.77) 
Substituting Eq. (1.77) into Eq. (1.75), we obtain 
P(A UB) = P(A) + P(B) — P(ANM B) 
1.28. Let P(A) = 0.9 and P(B) = 0.8. Show that P(A M B) = 0.7. 
From Eq. (1.43), we have 
P(AM B) = P(A) + P(B) — P(A UB) 
By Eq. (1.47), 0 < P(A U B) < 1. Hence, 
P(A B) = P(A) | FOB) 1 (1.78) 
Substituting the given values of P(A) and P(B) in Eq. (1.78), we get 
PAN B)=0.9 + 0.8 — 1 =0.7 
Equation (1.77) is known as Bonferroni 5 inequality. 
1.29. Show that 
P(A) = PAN B) + PAN B) (1.79) 
From the Venn diagram of Fig. 1-9, we see that 


A-(4NMUCB and (Anmoans-& (1.80) 


Fig. 1-9 
Thus, by axiom 3, we have 
P(A) = PAN B) + P(ANB) 


1.30. Given that P(A) = 0.9, P(B) = 0.8, and P(A N B) = 0.75, find (a) P(A U B); 
(b) P(A A B); and (c) P(A NB). 


(a) By Eq. (1.43), we have 
P(A U B) = P(A) + P(B) — PAN B) =0.9 + 0.8 — 0.75 =0.95 
(b) By Eq. (1.79) (Prob. 1.29), we have 
P(AN B) = P(A) — PAN B)=0.9 —0.75 =0.15 


(c) By De Morgan’s law, Eq. (1.20), and Eq. (1.39) and using the result from 
part (a), we get 


P(AN B)= P(AU B)=1— P(AU B) =1—-0.95 = 0.05 
1.31. For any three events A,, Aj, and A3, show that 
P(A, UA, UA,) = POA) + POA) + Pid) - PA, AD 
PA Mp FA Ay) Pa) VA, CA) (1.R1} 

Let B = A, U A3. By Eq. (1.43), we have 

P(A, UB) — P(A) + PiB) — PIA, 1B) (1.82) 
Using distributive law (1.18), we have 

A, NB=A,N(A,UA,) = (A, NA,) U (A, NA,) 


Applying Eq. (1.43) to the above event, we obtain 
PAT BPA TA) PAPA PIA, OAS PHA) TTA] 


4 


= PA, A,) + MA, A} PIA, A, Ay) (1.3.4) 


Applying Eq. (1.43) to the set B = A, U A3, we have 


P(B) = P(A, UA,) = P(A,) + POA,) — P(A, 4, (1.84) 


Substituting Eqs. (1.84) and (1.83) into Eq. (1.82), we get 


P(A, UA, UA,) = P(A,) + P(A) + POA,) — P(A, NA,) — P(A, A, 
— P(A, MA,) + P(A, NA, TAG) 


1.32. Prove that 


\ 
shes 8) P(A.) (1.85) 


which is known as Boole ’ inequality. 


We will prove Eq. (1.85) by induction. Suppose Eq. (1.85) is true for n = k. 


k k 
“(Ua}= ¥' P(A;) 
= i=] 


i= 


Then 


+ P(A; +)) [by Eq.(] 48)] 


tgs 8 
Ua, 


i=l 


‘ k+1 
= 3 P(A) + PA = } PAD 


t=1 i=] 


Thus, Eq. (1.85) is also true for n =k + 1. By Eq. (1.48), Eq. (1.85) is true for 
n= 2. Thus, Eq. (1.85) is true or n> 2. 


1.33. Verify Eq. (1.46). 


Again we prove it by induction. Suppose Eq. (1.46) is true for n = k. 


k 
= J P(Ai) 
i=l 


(Us 


Then 


= U Ag +1 


k+1 
(Ua 
i=l 


i 


i=1 


Using the distributive law (1.22), we have 


Us 
i=l, 


k k 
MA, 44 =UG NAc) =YS=2 
i=l i=l 


since A;  A;= @ fori #7. Thus, by axiom 3, we have 


; k+1 
+ P(A, )= > PCA) 


t=1 


k 


A; 


j=l 


k+1 | 
P|) & RP 
i=] 


/ 


which indicates that Eq. (1.46) is also true form =k + 1. By axiom 3, Eq. 
(1.46) is true for n = 2. Thus, it is true for n > 2. 


1.34. A sequence of events {A,, n = 1} is said to be an increasing sequence if [Fig. 
1-10(a)] 


A CA, Get A CA 


a Eby CHS (1.86a) 


(2) (6) 


Fig. 1-10 


whereas it is said to be a decreasing sequence if [Fig. 1-10(b)] 


AA. ea She SS (1.86b) 


K+1—7 


If {A,, 1 = 1} is an increasing sequence of events, we define a new event A,, 
by 


A, = lim A, =a; (187) 


m0 
=| 


Similarly, if {A,, 1 = 1} is a decreasing sequence of events, we define a new 
event A,, by 


o 5] 


A, ~ dire 1A, a a A, ( 88) 


t=" 


Show that if {4,, 1 = 1 } is either an increasing or a decreasing sequence of 
events, then 


lim P(A,) = P(A,) (1.89) 
noo , 


which is known as the continuity theorem of probability. 


If {A,, n = 1} is an increasing sequence of events, then by definition 


n—-1 
U A; = Aj 
i=1 


Now, we define the events B,, n = 1, by 


B,=A, 
B,=A,NA, 
B,=A,NA_, 


Thus, B,, consists of those elements in A, that are not in any of the earlier A;, 
k <n. From the Venn diagram shown in Fig. 1-11, it is seen that B,, are 
mutually exclusive events such that 


Ua, =a, for all n= 1,and Us, = Ua, =A, 
i=l i= i= i= 


i=] i=l 


A, . A; 
Thus, using axiom 3’, we have 
{x oe \ *x 
Pan)= Pia |= Pla |= 5 Pua 
isl i=1 } 75] 
y ( n : 
= lim ¥ P(B,)= lim P a 
Ea f nae U i ( 1 90) 


‘ 


fd 
= tim P|, = tin POA, 
Liz ; oe 


Next, if {4,, 1 = 1} is a decreasing sequence, then {A, 7 < 1} is an 


increasing sequence. Hence, by Eq. (1.89), we have 


= lim P(A,) 


no 


UA 


From Eq. (1.25), 


Thus, 


bs) 


A 


t=! 


P 


| = lim P(A, ) 


Using Eq. (1.39), Eq. (1.91) reduces to 


|- | (V4 = lm hl ue PA, =] = tin PLA, A 
ic] | i FP kore 
as 


(V4 | = FIA,J= hm FIA.) 
ba) ante 


t 
\ 


Thus, I 


Combining Eqs. (190) and (1.92), we obtain Eq. (1.89). 


Equally Likely Events 


(1.91) 


(1.oo3 


1.35. Consider a telegraph source generating two symbols, dots and dashes. We 
observed that the dots were twice as likely to occur as the dashes. Find the 


probabilities of the dots occurring and the dashes occurring. 


From the observation, we have 


P(dot) = 2P(dash) 


1.36. 


Leg 


Then, by Eq. (1.51), 


P(dot) + P(dash) = 3 P(dash) = 1 


Thus, 


P(dash) = + and P(dot) = 


U|N 


The sample space S of a random experiment is given by 


S = {a, b,c, d} 


with probabilities P(a) = 0.2, P(b) = 0.3, P(c) = 0.4, and P(d) = 0.1. Let A 
denote the event {a, b}, and B the event {b, c, d}. Determine the following 
probabilities: (a) P(A); (6) P(B); (c) P(A); (ad) P(A U B); and (e) P(A ON B). 


Using Eq. (1.52), we obtain 


(a) 
(b) 
(c) 
(a) 
(e) 


P(A) = P(a) + P(b) = 0.2 + 0.3 =0.5 

P(B) = P(b) + P(c) + Pid) = 0.3 + 0.4 40.1 =0.8 
A = {c, d}; P(A) = P(c) + Pid) =0.4 +0,1 =0.5 
AUB= {a, b, c, d} = S; P(A U B) = P(S) = 1 

AM B= {b}; PAN B) = P(b) =0.3 


An experiment consists of observing the sum of the dice when two fair dice 
are thrown (Prob. 1.5). Find (a) the probability that the sum is 7 and (b) the 
probability that the sum is greater than 10. 


(a) Let ¢;, denote the elementary event (sampling point) consisting of the 


following outcome: ¢;; = (i, /), where i represents the number appearing 


on one die and / represents the number appearing on the other die. Since 
the dice are fair, all the outcomes are equally likely. So P(E, ) = xe Let A 


denote the event that the sum is 7. Since the events ¢;, are mutually 
exclusive and from Fig. 1-3 (Prob. 1.5), we have 


P(A) = PE yg US) U E34 US y3 USs) US 1) 
=PFG5) + Ga) +P Gy) hea) + Pee) Pe) 


3) 


/ 


(6b) Let B denote the event that the sum is greater than 10. Then from Fig. 1- 
3, we obtain 


P(B)= P(Es6 U O65 U Coo) = P(E56) + P(E 65 }- P(e66 ) 
-32]-< 
36) 12 


1.38. There are 7 persons in a room. 


(a) What is the probability that at least two persons have the same birthday? 
(b) Calculate this probability for n = 50. 
(c) How large need n be for this probability to be greater than 0.5? 


(a) As each person can have his or her birthday on any one of 365 days 
(ignoring the possibility of February 29), there are a total of (365)” 
possible outcomes. Let A be the event that no two persons have the same 
birthday. Then the number of outcomes belonging to A is 


n(A) = (365)(364) «+» (365 —n + 1) 


Assuming that each outcome is equally likely, then by Eq. (1.54), 


se nA) 3 (365)(364) --- (365 —a +1) 
nS) (365)" 


P(A} 


(1.93) 


Let B be the event that at least two persons have the same birthday. Then 
B= 4 and by Eq.(1.39), P(B) = 1 — P(A). 


(b) Substituting n = 50 in Eq. (1.93), we have 
P(A) = 0.03 and P(B) = 1 — 0.03 =0.97 


(c) From Eq. (1.93), when n = 23, we have 


P(A) = 0.493 and P(B) = 1.— P(A) = 0.507 


That is, if there are 23 persons in a room, the probability that at least two 
of them have the same birthday exceeds 0.5. 


1.39. A committee of 5 persons is to be selected randomly from a group of 5 men 
and 10 women. 


(a) Find the probability that the committee consists of 2 men and 3 women. 
(b) Find the probability that the committee consists of all women. 


(a) The number of total outcomes is given by 


ne{ | 
5 


It is assumed that “random selection” means that each of the outcomes is 
equally likely. Let A be the event that the committee consists of 2 men 
and 3 women. Then the number of outcomes belonging to A is given by 


Ware) I | 
2 3 
Thus, by Eq. (1.54), 


EI) 
, A(A) \2/\ 3 400 _ 


—= =(), 
15 1001 
5a 


(b) Let B be the event that the committee consists of all women. Then the 
number of outcomes belonging to B is 


wn 


Thus, by Eq. (1.54), 


1.40. Consider the switching network shown in Fig. 1-12. It is equally likely that a 
switch will or will not work. Find the probability that a closed path will exist 
between terminals a and b. 


3, 
a Si b 
s 
2 
S4 


Fig. 1-12 


Consider a sample space S of which a typical outcome is (1, 0, 0, 1), 
indicating that switches 1 and 4 are closed and switches 2 and 3 are open. 
The sample space contains 2*= 16 points, and by assumption, they are 
equally likely (Fig. 1-13). 


0000 
0001 
0010 
0011 
0100 
0101 
0110 
0111 
1000 
1001 
1010 
1011 
1100 
1101 
1110 
1111 


+ 4----4----Bo 


Fig. 1-13 


Let A;, i= 1, 2, 3, 4 be the event that the switch s; is closed. Let A be the 
event that there exists a closed path between a and b. Then 


A =A, U(A,NA,)U (A, NA,) 


Applying Eq. (1.45), we have 


P(A) = PLA, U (A Aq) (A, 1g) 
= P(A,) + P(A, N.A3)— P(A, NAy) 
— PLA, (Ay 9 Ay)) — PLA, (Ag 1.44) — PLA, DAVIN (Ay DAG 
+ PLA) M(A, 7 ANCA, NAG) 
= P(A,) + P(A, 1.43) — P(A, Ag) 
P(A,MA,MAx) P(A,MASN A) P(A NA;NAq) 
+ P(A) NA, NA, MA) 


Now, for example, the event A, M A; contains all elementary events with a 1 
in the second and third places. Thus, from Fig. 1-13, we see that 


n(A,) =8 MA, M Az) =4 mA, NA,)=4 
n(A,NA,MAx)=2 n(A,NA,NA)=2 


Thus, 


.i%.4.2.2. 8.7% We: 
P(A) =— + —+—-2-2-+—=— =0.688 
16 16 16 16 16 16 16 16 


1.41. Consider the experiment of tossing a fair coin repeatedly and counting the 
number of tosses required until the first head appears. 


(a) Find the sample space of the experiment. 
(b) Find the probability that the first head appears on the fth toss. 
(c) Verify that P(S) = 1. 


(a) The sample space of this experiment is 


© "he Ge Cex wench = 1S Nk lined 


3 


where e; is the elementary event that the first head appears on the Ath 
toss. 


(6) Since a fair coin is tossed, we assume that a head and a tail are equally 
likely to appear. Then P(H) = P(T) = I Let 


P(e,) = P; b= 1.2.33 5% 


Since there are 2* equally likely ways of tossing a fair coin k times, only 
one of which consists of (A — 1) tails following a head we observe that 


l 
P(e,) =p, a &k=1,2.3... (1.94) 


ry 


(c) Using the power series summation formula, we have 


os 


rsd= 5 Pe)= 5 4-3 (1 =—+—_=| (1.95) 


k=1 % =n? iss 


1.42. Consider the experiment of Prob. 1.41. 


(a) Find the probability that the first head appears on an even-numbered 
toss. 


(b) Find the probability that the first head appears on an odd-numbered toss. 


(a) Let A be the event “the first head appears on an even-numbered toss.” 
Then, by Eq. (1.52) and using Eq. (1.94) of Prob. 1.41, we have 


l 
cad 2] | x fe ag ; l 
PA)=P, + Pg + Po t= Y Paw = se = D4} == 
m=l mal m=1° i 


(b) Let B be the event “the first head appears on an odd-numbered toss.” 
Then it is obvious that B = 4. Then, by Eq. (1.39), we get 


P(B) = P(A) =1- P(A) = 1-5 = 


As a check, notice that 


cr _* _wv 1 _iefly _1] 1 [_2 
P(B) =p, t+ Pat ps tee = > Panett - D3 yl a) ») ta 9 a 4 
m=0 m=0 m=0 * = 
Al 
Conditional Probability 
1.43. Show that P(A | B) defined by Eq. (1.55) satisfies the three axioms of a 
probability, that is, 
id) P| =o (1.98) 
eH: BS = at 
fe) INA, Ua, | B= PA, a) PIA, | Bria, WA, = 2 (198i 


(a) From definition (1.55), 


P(ANB) 


P(A|B)= ais 


P(B)>0 


By axiom 1, P(A M B) = 0. Thus, 
P(A | B)=0 


(b) By Eq. (1.5), SN B=B. Then 


P(S|B)=—2 = = 
P(B) ~——P(B) 
(c) By definition (1.55), 
P(A, UA, |B) = PAU Ad) BI 


P(B) 
Now by Eqs. (1.14) and (1.17), we have 
(A, UA,)N B= (A, NB) UA, NB) 


and A, M A, = @ implies that (4, NB) N (4, N B) =O. Thus, by axiom 3 we 
get 


P(A, UA,|B) = PAB) + PAB) _ PAL), PAB) 
: P(B) P(B) P(B) 
=P(A,|B)+P(A,|B) if A, NA, =@ 
1.44. Find P(A | B) if (a) AN B=@,(b) ACB, and(c) BCA. 


(a) If AN B=@, then P(A N B) = P(O) = 0. Thus, 


(6) If A CB, then A 1 B=A and 


P(AM B) _ P(A) 


P(A|B) = 
a) P(B) ——~P(B) 


(c) IfB CA, then AM B=B and 


1.45. Show that if P(A | B) > P(A), then P(B | A) > P(B). 


If P(A|B) = a > P(A), then P(AM B)> P(A)P(B). Thus, 
p(p|a) = 2408) , POPS) _ pvp) or — P(BIA) > P(B) 
P(A) P(A) 
1.46. Show that 
P(A|B)-1- P(A|B) 


By Eqs. (1.6) and (1.7), we have 
AUA=S, ANA=@ 
Then by Eqs. (1.97) and (1.98), we get 
P(A UA|B)= P(S|B) =1 = P(A|B) + P(A|B) 
Thus we obtain 
P(A|B) =1— P(A|B) 
note that Eq. (1.99) is similar to property 1 (Eq. (1.39)). 
1.47. Show that 


P(A, UA] B)= P(A IB) 1 PUA, B) 


fy fia] A, a Ay 
Using a Venn diagram, we see that 
A, UA, =A, U(A,\A,) = A, U (A, 1A) 


and 


(1.99) 


(1.100% 


1.48. 


A, U(A, NA,) =@ 
By Eq. (1.98) we have 
P(A, As| A) PLA UA, 14a] — P|) LA, AY] 8) (1.100) 
Again, using a Venn diagram, we see that 
A, =(A, M.A,)U(A,\.4,) =(A, 1.4,)U (A, NA) 
and 
(A, NA,)U(A, NA,) =D 
By Eq. (1.98) we have 
P(A,|B)- PIA NA, )U(A.N A\)|B | — P(A, A,|B\+ P(A, 1A) |B) 
Thus, 
P(A, MA,|B) = P(A,|B)— P(A, MA, B) (1.101) 
Substituting Eq. (1.101) into Eq. (1.100), we obtain 
P(A, U.A,|B) = P(A,|B) + P(A,|B) — P(A, 9 Ay|B) 
Note that Eq. (1.100) is similar to property 5 (Eq. (1.43)). 


Consider the experiment of throwing the two fair dice of Prob. 1.37 behind 
you; you are then informed that the sum is not greater than 3. 


(a) Find the probability of the event that two faces are the same without the 
information given. 


(b) Find the probability of the same event with the information given. 


(a) Let A be the event that two faces are the same. Then from Fig. 1-3 (Prob. 
1.5) and by Eq. (1.54), we have 


A= {(i,)):i=1,2,..., 6} 


and 


(b) Let B be the event that the sum is not greater than 3. Again from Fig. 1- 
3, we see that 


B40. )t--f =r = 10; Detls Bet} 
and 


_mB)_ 3 1 


PB) nS) 36 12 


Now 4 / B is the event that two faces are the same and also that their 
sum is not greater than 3. Thus, 


nANB)_ 1 


P(AN B)= 
( n(S) 36 


Then by definition (1.55), we obtain 


a 


Note that the probability of the event that two faces are the same doubled 
from it to ; with the information given. 


Alternative Solution: 


There are 3 elements in B, and 1 of them belongs to A. Thus, the probability 
of the same event with the information given is ‘ 


1.49. Two manufacturing plants produce similar parts. Plant 1 produces 1,000 
parts, 100 of which are defective. Plant 2 produces 2,000 parts, 150 of which 


1.50. 


are defective. A part is selected at random and found to be defective. What is 
the probability that it came from plant 1? 


Let B be the event that “the part selected is defective,” and let A be the event 
that “the part selected came from plant 1.” Then 4 M B is the event that the 
item selected is defective and came from plant 1. Since a part is selected at 
random, we assume equally likely events, and using Eq. (1.54), we have 


Pang 
3000 30 


Similarly, since there are 3000 parts and 250 of them are defective, we have 


pcpy= 250.1 
3000 «12 


By Eq. (1.55), the probability that the part came from plant 1 is 


] 


Alternative Solution: 


There are 250 defective parts, and 100 of these are from plant 1. Thus, the 


probability that the defective part came from plant 1 is Sy 0.4. 


Mary has two children. One child is a boy. What is the probability that the 
other child is a girl? 


Let S be the sample space of all possible events S = {B B, B G, G B, G G} 
where B B denotes the events the first child is a boy and the second is also a 
boy, and BG denotes the event the first child is a boy and the second is a girl, 
and so on. 


Now we have 


PIL{BB}| = PLL BG}|= PH{GB}| = P{GG}|=1/4 


1.51. 


Let B be the event that there is at least one boy; B = {B B, B G, G B}, and A 
be the event that there is at least one girl; A = {B G, GB, GG} 


Then 


AN B={BG,GB} and P(AMB)=1/2 
P(B)=3/4 


Now, by Eq. (1.55) we have 


Note that one would intuitively think the answer is 1/2 because the second 
event looks independent of the first. This problem illustrates that the initial 
intuition can be misleading. 


A lot of 100 semiconductor chips contains 20 that are defective. Two chips 
are selected at random, without replacement, from the lot. 


(a) What is the probability that the first one selected is defective? 


(b) What is the probability that the second one selected is defective given 
that the first one was defective? 


(c) What is the probability that both are defective? 


(a) Let A denote the event that the first one selected is defective. Then, by 
Eq. (1.54), 


(b) Let B denote the event that the second one selected is defective. After 
the first one selected is defective, there are 99 chips left in the lot, with 
19 chips that are defective. Thus, the probability that the second one 
selected is defective given that the first one was defective is 


P(B | A) = $3 = 0.192 


(c) By Eq. (1.57), the probability that both are defective is 
P(A NB) = P(B | A)P(A) = (£2) (0.2) = 0.0384 


1.52. A number is selected at random from {1, 2, ..., 100). Given that the number 
selected is divisible by 2, find the probability that it is divisible by 3 or 5. 


Let 


A, = event that the number is divisible by 2 
A, = event that the number is divisible by 3 


A. = event that the number is divisible by 5 


Then the desired probability is 
PUiA, UA A] 
PIA.) 
_ PUTA, ACA; MAD) 
PiA) 
_ Pla, MA, )+ Pts 7A) )~ Play Pig P1Ay) 
Pi Ay) 


Ag las {Ay )= [Py.(1.55)] 


[Ey.(1.18)] 
[Ey.(1 4b] 


Now 


A, ™ A, = event that the number is divisible by 6 
A; A, = event that the number is divisible by 10 


A, A; A, = event that the number is divisible by 30 


6 1 3 
P(A,NA,)=—— = P(ASN AS)=— ——P(AZN ANA) = — 
Oe 10 a Be 


16 , 10 3 


_100 100 100 _23_ 
P(A; U As |Ap) 50 50 0.46 
100 
1.53. Show that 
P(AN BN C)— P(AP(BIAP(C|AN B) (1,102) 


By definition Eq. (1.55) we have 


P(BM A) P(CNAN B) 
P(A) P(AM B) 
=P(AN BNC) 


P(A)P(B|A)P(C|AN B) = P(A) 


since P(BM A)= P(AN B)and (CN AN B)=P(AN BNC). 
1.54. Let A,, A>, ..., A, be events in a sample space S. Show that 
Pa A. 714) = Aa Pi, | A WPA, | rs carat | APTANA, 1) (1.103) 
We prove Eq. (1.103) by induction. Suppose Eq. (1.103) is true for n = k: 
PATA AD PAA,.| APA, A, CA PA ALT iA} 
Multiplying both sides by P(4;,,, | A; 9 A.M... 9 A,), we have 
P(A, NA, Ns NAPA, | A, A, N+ OA) = PIA, NA, OAL, 
and 


POA, MA, A DA, 4) > PA PEA, 


AAPA, | A, TAS PA, [A A, AD 


Thus, Eq. (1.103) is also true form =k + 1. By Eq. (1.57), Eq. (1.103) is true 
for n = 2. Thus, Eq. (1.103) is true for n < 2. 


1.55. Two cards are drawn at random from a deck. Find the probability that both 
are aces. 


1.56. 


Let A be the event that the first card is an ace, and let B be the event that the 
second card is an ace. The desired probability is P(B MN A). Since a card is 
drawn at random, P(A) = 5 Now if the first card is an ace, then there will be 


3 aces left in the deck of 51 cards. Thus, P(B | A) = 2. By Eq. (1.57), 


eV a 
P(BN A)=P(B\|A)P(A -(2\(4 = 
(BMA) = P(B|AP(A) (|S) 


2 
221 


Check: 
By counting technique, we have 


A 
2} _ (4)3) _ 


l 
iw (52)(51) 221 
2 


P(BN A)= 


There are two identical decks of cards, each possessing a distinct symbol so 
that the cards from each deck can be identified. One deck of cards is laid out 
in a fixed order, and the other deck is shuffled and the cards laid out one by 
one on top of the fixed deck. Whenever two cards with the same symbol 
occur in the same position, we say that a match has occurred. Let the number 
of cards in the deck be 10. Find the probability of getting a match at the first 
four positions. 


Let A;, i= 1, 2, 3, 4, be the events that a match occurs at the ith position. The 
required probability is 


P(A, NA, NA, NA,) 
By Eq. (1.103), 
P(A, A, OA, TAY) = PAPUA, | APIA, | A, MADPUA, | A, 74,744) 


There are 10 cards that can go into position 1, only one of which matches. 
Thus, P(A,) = 7" P(A, | A,) is the conditional probability of a match at 
position 2 given a match at position 1. Now there are 9 cards left to go into 


position 2, only one of which matches. Thus, P(A, | A,) = ‘. In a similar 
fashion, we obtain P(A, | A, NA,) = yand P(A, | A, NA, NA,) = . Thus, 


PA, A, NA, NA,) = (4) (A)(z)(4) 2 — 


Total Probability 
1.57. Verify Eq. (1.60). 


Since B M S= B [and using Eq. (1.59)], we have 
—(BNA)U(BNA)U--U(BN A) 


Now the events BM A;,i= 1, 2, ..., n, are mutually exclusive, as seen from 


the Venn diagram of Fig. 1-14. Then by axiom 3 of probability and Eq. 
(1.57), we obtain 


P(B)= P(BNS)= § PBN A) = 5 PBA) P(A) 


i=] i=] 


BNA, 


Fig. 1-14 


1.58. Show that for any events 4 and B in S, 


1.59. 


P(B) = P(B | A\P(A) + P(B | ADPIA) (1.103) 
From Eq. (1.78) (Prob. 1.29), we have 
P(B) = P(BM A) + P(BNA) 
Using Eq. (1.55), we obtain 
P(B) = P(B | A)P(A)+ P(B | A)P(A) 
Note that Eq. (1.105) is the special case of Eq. (1.60). 


Suppose that a laboratory test to detect a certain disease has the following 
statistics. Let 


A = event that the tested person has the disease 


8 = event that the test result is positive 
It is known that 
P(B|A)=0.99 and P(B | A) = 0.005 


and 0.1 percent of the population actually has the disease. What is the 
probability that a person has the disease given that the test result is positive? 


From the given statistics, we have 
P(A) =0.001 then P(A) = 0.999 


The desired probability is P(A | B). Thus, using Eqs. (1.58) and (1.105), we 
obtain 


P(BIA)P(A) 
P(B|A)P(A) + P(B|A)P(A) 
= (0.99)(0,001) ~%4 
(0.990.001) +(0.005\(0.999) 


P(A\B)= 


1.60. 


1.61. 


Note that in only 16.5 percent of the cases where the tests are positive will 
the person actually have the disease even though the test is 99 percent 
effective in detecting the disease when it is, in fact, present. 


A company producing electric relays has three manufacturing plants 
producing 50, 30, and 20 percent, respectively, of its product. Suppose that 
the probabilities that a relay manufactured by these plants is defective are 
0.02, 0.05, and 0.01, respectively. 


(a) If arelay is selected at random from the output of the company, what is 
the probability that it is defective? 


(b) If arelay selected at random is found to be defective, what is the 
probability that it was manufactured by plant 2? 


(a) Let B be the event that the relay is defective, and let A; be the event that 


the relay is manufactured by plant i (i = 1, 2, 3). The desired probability 
is P(B). Using Eq. (1.60), we have 


3 
P(B) = ¥) PRIA; )P(A;) 
z=] 


= (0.02)(0.5) + (0.05)(0.3) + (0.01)(0.2) = 0.027 


(b) The desired probability is P(A, | B). Using Eq. (1.58) and the result from 
part (a), we obtain 


\B)= PRIA, )P(Ag) _ (0.05)(0.3) _ 9 566 


Two numbers are chosen at random from among the numbers | to 10 without 
replacement. Find the probability that the second number chosen is 5. 


Let A, i= 1,2, ..., 10 denote the event that the first number chosen is i. Let 
B be the event that the second number chosen is 5. Then by Eq. (1.60), 


10 
P(B)= ¥ P(BIA;)P(A;) 


i=1 


1.62. 


Now P(A,) = =: P(B | A,) is the probability that the second number chosen is 
5, given that the first is 7. Ifi =5, then P(B | A;) = 0. Ifi 45, then 


P(B| A) = : Hence, 


; 2 me 1) 4 
(B) = ¥ P(B|A, )P(A,) = 9| — || —|=— 


i=] 


Consider the binary communication channel shown in Fig. 1-15. The 
channel input symbol X may assume the state 0 or the state 1, and, similarly, 
the channel output symbol Y may assume either the state 0 or the state 1. 
Because of the channel noise, an input 0 may convert to an output | and vice 
versa. The channel is characterized by the channel transition probabilities pp, 
Jo, P}, and q,, defined by 


Po = P(Y,|Xo) and Py = P(vo |x) 


do = P(¥o|Xo) and gq, = PC, |x) 


Fo 
0 0 
Po 
x ¥ 
Py 
qy 
Fig. 1-15 


where xg and x, denote the events (X = 0) and (X = 1), respectively, and yo 

and y, denote the events (Y= 0) and (Y= 1), respectively. Note that py + qo = 

1 =p, + q,. Let P(xp) = 0.5, pp = 0.1, and p, = 0.2. 

(a) Find P(yp) and P(y)). 

(b) If a0 was observed at the output, what is the probability that a 0 was the 
input state? 


(c) Ifa 1 was observed at the output, what is the probability that a 1 was the 
input state? 


(d) Calculate the probability of error P.. 


(a) We note that 
P(x,)=1— P(%)) =1-0.5 =05 
P(g |X) = Gy =1— py =1-0.1=0.9 
PCy, |x.) = 9, =1— p, =1-0.2=0.8 
Using Eq. (1.60), we obtain 
P(¥q) = Pg | Xp) PC) + PO | 4 )P(,) = 0.9(0.5) + 0.20.5) = 0.55 


Py) = PCY | tp) PCy) + PO | 4) PCa) = 0.1(0.5) + 0.80.5) = 0.45 


(b) Using Bayes’ rule (1.58), we have 


(d) The probability of error is 
P= PR, | APR) — POs |FDL@ JH 81035) — 0.205) =0N8. 
Independent Events 


1.63. Let A and B be events in an event space F. Show that if A and B are 
independent, then so are (a) A and p, (b) A—and B, and (c) 4 and Bp. 


(a) From Eq. (1.79) (Prob. 1.29), we have 
P(A) = PAN B)+ PANB) 


Since A and B are independent, using Eqs. (1.62) and (1.39), we obtain 


P(A TB) = P(A)— P(A A) = PLA) — PCA) PCR) 


ne (1.106) 
= P(A) — PC B)| = PCA) PCR) 
Thus, by definition (1.62), A and 8 are independent. 
(b) Interchanging A and B in Eq. (1.106), we obtain 
P(BNA) = P(B) P(A) 
which indicates that 4 and B are independent. 
(c) We have 
PAB) PRAU BT [Rq. (1 Mh] 
—1 PALL RB) [Hq (7,39) 
=1— PLA)— PER) + PLAC AY [Fy (hey 
—1— PIAL — POR) + PAPUA [Hq (7,02) 
=1— PLAI— PERI — PLAY 
=[1— PALE PCE} 
= P(AVPTR [Ry. 11.395] 


Hence, 4 and pf are independent. 


1.64. Let A and B be events defined in an event space F’. Show that if both P(A) 
and P(B) are nonzero, then events 4 and B cannot be both mutually exclusive 
and independent. 


Let A and B be mutually exclusive events and P(A) # 0, P(B) 4 0. Then P(A 
M B) = P(®) = 0 but P(A)P(B) # 0. Since 


P(A 1 B) # P(A)P(B) 
A and B cannot be independent. 


1.65. Show that if three events A, B, and C are independent, then A and (B U C) 
are independent. 


We have 


PAN BUC] - PAIR UANCY (Eq. (1.189) 
—PANB- PAN C)-PANBOC) [Eq. (1 44)] 
— PUAYPORY | PCALPIO) — PLALCR ELC) [Hiy. (1.0) 
= MAMPI) + PIAWPIC)- PAPC) [Ly (1.684) 
= PAPPR) + PCV PBC) 
= PAIR BUC) [Eq. (1 449] 


Thus, 4 and (B U C) are independent. 


1.66. Consider the experiment of throwing two fair dice (Prob. 1.37). Let A be the 
event that the sum of the dice is 7, B be the event that the sum of the dice is 
6, and C be the event that the first die is 4. Show that events A and C are 
independent, but events B and C are not independent. 


From Fig. 1-3 (Prob. 1.5), we see that 


A= (0169 G5» Soar Sy3> S52» Sot 
B= {0155 Sys S332 Gyo» Soi} 
C= qos one ar ave Cas? Cat 


and 
ANC={C,,} BNC={C,,} 
Now 
P(A)= =e PB) ==. P= =e 
and 
1 


PANC)= 36 = P(A)P(C) 


Thus, events A and C are independent. But 


1.67. 


1.68. 


PBN C)==- + PB)P(C) 


Thus, events B and C are not independent. 


In the experiment of throwing two fair dice, let A be the event that the first 
die is odd, B be the event that the second die is odd, and C be the event that 
the sum is odd. Show that events A, B, and C are pairwise independent, but 
A, B, and C are not independent. 


From Fig. 1-3 (Prob. 1.5), we see that 


ces ol 
P(A) = P(B) = P(C) = —=-— 
(A) = PUB) = P(C) 36.0 


P(AN B)=P(ANC)=P(BNO=—= 


Thus 


P(AN B)=—=P(A)P(B) 


P(ANC)=—=PCA)PC) 


P(BNC)=—=P(B)P(C) 


which indicates that A, B, and C are pairwise independent. However, since 
the sum of two odd numbers is even, {4 M1 BN C)=@ and 


PAN BOC) =0# 2 = PA)PB)P(C) 


which shows that A, B, and C are not independent. 


A system consisting of n separate components is said to be a series system if 
it functions when all n components function (Fig. 1-16). Assume that the 
components fail independently and that the probability of failure of 
component j is p;,i= 1, 2, ..., n. Find the probability that the system 
functions. 


1.69. 


Fig. 1-16. Series system. 


Let A; be the event that component s; functions. Then 
P(A) = 1-P(A,) = 1-p, 


Let A be the event that the system functions. Then, since A,’s are 
independent, we obtain 


‘ ‘ 
n n it 

P(A) = q any | =[[Paa=[]a a (1.107) 
it} ist 


i=! 


A system consisting of n separate components is said to be a parallel system 
if it functions when at least one of the components functions (Fig. 1-17). 
Assume that the components fail independently and that the probability of 
failure of component / is p;,i=1, 2, ..., n. Find the probability that the 


system functions. 


Fig. 1-17 Parallel system. 


Let A be the event that component s; functions. Then 


P(A)) = P; 


Let A be the event that the system functions. Then, since A sare 
independent, we obtain 


Hayate) =1- +44, |=1-T] (1.108) 


i=l 
1.70. Using Eqs. (1.107) and (1.108), redo Prob. 1.40. 


From Prob. 1.40, p, = | i =1, 2, 3, 4, where p; is the probability of failure of 


»? 
switch s;. Let A be the event that there exists a closed path between a and b. 
Using Eq. (1.108), the probability of failure for the parallel combination of 


switches 3 and 4 is 
_{lyitj_l 
P34 = P3P4 5 5 4 


Using Eq. (1.107), the probability of failure for the combination of switches 


2, 3, and 4 is 
iN il 3 3 
es ee be =1- - 
—_ | 4 4, § §& 
Again, using Eq. (1.108), we obtain 


P(A) =1— pi Prag =! -(5]<}- j-2=u 


1.71. A Bernoulli experiment is a random experiment, the outcome of which can 
be classified in but one of two mutually exclusive and exhaustive ways, say 
success or failure. A sequence of Bernoulli trials occurs when a Bernoulli 
experiment is performed several independent times so that the probability of 
success, say p, remains the same from trial to trial. Now an infinite sequence 
of Bernoulli trials is performed. Find the probability that (a) at least 1 
success occurs in the first n trials; (b) exactly k successes occur in the first 
trials; (c) all trials result in successes. 


(a) In order to find the probability of at least 1 success in the first 7 trials, it 
is easier to first compute the probability of the complementary event, that 
of no successes in the first 7 trials. Let A; denote the event of a failure on 


the ith trial. Then the probability of no successes is, by independence, 


PLA, IA, OMA) — PAPA) PLA) (1 - pP (1.109) 


Hence, the probability that at least 1 success occurs in the first n trials is 
1—(1—p)”. 


(b) In any particular sequence of the first n outcomes, if A successes occur, 
where k = 0, 1, 2, ..., n, then n — k failures occur. There are (7) such 


sequences, and each one of these has probability p*(1 — p)’*. Thus, the 
probability that exactly k successes occur in the first 7 trials is given by 


n k n—-k 
pb (1— p . 
‘} (1— p) 


(c) Since A, denotes the event of a success on the ith trial, the probability 
that all trials resulted in successes in the first 7 trials is, by independence, 


P(A, VA, 1 1A.) = PLA, )PLA,) --- P(A.) = (1.110) 
Hence, using the continuity theorem of probability (1.89) (Prob. 1.34), 
the probability that all trials result in successes is given by 


p«l 


lAa- A; oa ay i|- lim “fa A |- lim p” =) 
i=l ; aa, = no | tea a i= = 


1.72. Let S be the sample space of an experiment and S = {A, B, C}, where P(A) = 
p, P(B) =q, and P(C) = r. The experiment is repeated infinitely, and it is 
assumed that the successive experiments are independent. Find the 
probability of the event that A occurs before B. 


Suppose that A occurs for the first time at the nth trial of the experiment. If 4 
is to have occurred before B, then C must have occurred on the first (7 — 1) 


trials. Let D be the event that 4 occurs before B. 


Then 


1.73. 


n=1 


where D,, is the event that C occurs on the first (7 — 1) trials and A occurs on 
the nth trial. Since D,,’s are mutually exclusive, we have 


P(D) = ¥' P(D,,) 
n=1 


Since the trials are independent, we have 


P(D,) = [P(C)y"! P(A) = rp 


Thus, 
, - n- - k P P 
2 2, I-r ptq 
or 
P(A) ; 
P(D) = ————— (1.111) 
P(A) + P(B) 


sincept+tqt+r=l1. 


In a gambling game, craps, a pair of dice is rolled and the outcome of the 
experiment is the sum of the dice. The player wins on the first roll if the sum 
is 7 or 11 and loses if the sum is 2, 3, or 12. If the sum is 4, 5, 6, 8, 9, or 10, 
that number is called the player’s “point.” Once the point is established, the 
rule is: If the player rolls a 7 before the point, the player loses; but if the 
point is rolled before a 7, the player wins. Compute the probability of 


winning in the game of craps. 


Let A, B, and C be the events that the player wins, the player wins on the first 
roll, and the player gains point, respectively. Then P(A) = P(B) + P(C). Now 
from Fig. 1-3 (Prob. 1.5), 


P(B) = P(sum = 7) + P(sum = 11) = & +2 = 
3 2 


Let A; be the event that point of k occurs before 7. Then 


P(C)= Ss P(A, )P(point = k) 


k E14,5,6.8,9,10} 


By Eq. (1.111) (Prob. 1.72), 


P(A.) =— P(sum =e ) | 
P(sum =k) + P(sum = 7) 
Again from Fig. 1-3, 


> 


P(sum = 4) =— Rrra ier P(sum a 
36 36 36 
P(sum = 8) = = P(sum = 9) = ta P(sum = 10)= = 
36 36 36 
Now by Eq. (1.112), 
P(A _ P(A me P(A me 
2 — = Ga 
2 l 
PA Bgl i Penal 


Using these values, we obtain 


y 
P(A)= P(B) + P(C)=" + = = ().49293 


SUPPLEMENTARY PROBLEMS 


(1.112) 


1.74. Consider the experiment of selecting items from a group consisting of three 


items {a, b,c}. 


(a) Find the sample space S, of the experiment in which two items are 


selected without replacement. 


(b) Find the sample space S, of the experiment in which two items are 
selected with replacement. 


1.75. Let A and B be arbitrary events. Then show that A C B if and only if A U B= 
B. 


1.76. Let A and B be events in the sample space S. Show that if A C B, then B C 4 


1.77. Verify Eq. (1.19). 
1.78. Show that 
(AN B)\C = (A\C) N (B\C) 


1.79. Let A and B be any two events in S. Express the following events in terms of 
A and B. 


(a) At least one of the events occurs. 
(b) Exactly one of two events occurs. 


1.80. Show that A and B are disjoint if and only if 
AUB=AAB 


1.81. Let A, B, and C be any three events in S. Express the following events in 
terms of these events. 


(a) Either B or C occurs, but not A. 
(b) Exactly one of the events occurs. 
(c) Exactly two of the events occur. 


1.82. Show that F' = {S, @} is an event space. 


1.83. Let S= {1, 2, 3, 4} and F, = {S, O, {1, 3}, {2, 4}}, F, = {S, O, {1, 3}}. 
Show that F is an event space, and F; is not an event space. 


1.84. In an experiment one card is selected from an ordinary 52-card deck. Define 
the events: A = select a King, B = select a Jack or a Queen, C = select a 


Heart. 
Find P(A), P(B), and P(C). 


1.85. A random experiment has sample space S = {a, b, c}. Suppose that P({a, c}) 
= 0.75 and P({b, c)} = 0.6. Find the probabilities of the elementary events. 


1.86. Show that 
(a) P(AUB)=1-P(ANB) 


(b) P(AMB)=1-—P(A)— P(B) 
(c) P(AAB) = P(AUB)-P(ANB) 
1.87. Let A, B, and C be three events in S. If 
P(A) = P(B) = +, P(C) = +, PAN B)=1, PAN C) = +, and P(BN C)=0, 
find P(AUBUC). 
1.88. Verify Eq. (1.45). 
1.89. Show that 


P(A, NA, N+ NA,)= P(A,) + P(A,) + + P(A -(- 1) 


1.90. In an experiment consisting of 10 throws of a pair of fair dice, find the 
probability of the event that at least one double 6 occurs. 


1.91. Show that if P(A) > P(B), then P(A | B) > P(B | A). 


1.92. Show that 
(a) P(A|A)=1 


(b) P(ANB|C)=P(A|C)P(B|ANC) 
1.93. Show that 
PANBNC)=P(A| BNC) PB | C)P(C) 
1.94. An urn contains 8 white balls and 4 red balls. The experiment consists of 


drawing 2 balls from the urn without replacement. Find the probability that 
both balls drawn are white. 


1.95. 


1.96. 


17. 


1.98. 


1.99. 


There are 100 patients in a hospital with a certain disease. Of these, 10 are 
selected to undergo a drug treatment that increases the percentage cured rate 
from 50 percent to 75 percent. What is the probability that the patient 
received a drug treatment if the patient is known to be cured? 


Two boys and two girls enter a music hall and take four seats at random in a 
row. What is the probability that the girls take the two end seats? 


Let A and B be two independent events in S. It is known that P(A MN B) = 
0.16 and P(A U B) = 0.64. Find P(A) and P(B). 


Consider the random experiment of Example 1.7 of rolling a die. Let A be 
the event that the outcome is an odd number and B the event that the 
outcome is less than 3. Show that events A and B are independent. 


The relay network shown in Fig. 1-18 operates if and only if there is a closed 
path of relays from left to right. Assume that relays fail independently and 
that the probability of failure of each relay is as shown. What is the 
probability that the relay network operates? 


Fig. 1-18 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


1.74. 


175. 


(a) S, = {ab, ac, ba, bc, ca, cb} 
(b) S, = {aa, ab, ac, ba, bb, be, ca, cb, cc} 


Hint: Draw a Venn diagram. 


1.76. Hint: Draw a Venn diagram. 
1.77. Hint: Draw a Venn diagram. 
1.78. Hint: Use Eqs. (1.10) and (1.17). 
1.79. (a) AUB; (6) AAB 
1.80. Hint: Follow Prob. 1.19. 
1.81. _ 

(a) AN(BUC) 


(b) {AN(BUC)SU {BN (AUC) U{CN(AU B)} 
(c) (AN B)NC}UL{ANC)N BV {((BNC)NA} 


1.82. Hint: Follow Prob. 1.23. 

1.83. Hint: Follow Prob. 1.23. 

1.84. P(A) = 1/13, P(B) = 2/13, P(C) = 13/52 
1.85. P(a) = 0.4, P(b) = 0.25, P(c) = 0.35 


1.86. Hint: (a) Use Eqs. (1.21) and (1.39). 
(b) Use Eqs. (1.43), (1.39), and (1.42). 
(c) Use a Venn diagram. 


ee) 
1. e ae aa 
e 24 


1.88. Hint: Prove by induction. 


1.89. Hint: Use induction to generalize Bonferroni’s inequality (1.77) (Prob. 
1.28). 


1.90. 0.246 


1.91. Hint: Use Eqs. (1.55) and (1.56). 


1.92. 


1.93. 


1.94. 


Ls, 


1.96. 


137. 


1.98. 


1.99. 


Hint: Use definition Eq.(1.55). 

Hint: Follow Prob. 1.53. 

0.424 

0.143 

Es 

6 

P(A) = P(B) = 0.4 

Hint: Show that P(A ON B) = P(A)P(B) = 1/6. 


0.865 


CHAPTER 2 


Random Variables 


2.1 Introduction 


In this chapter, the concept of a random variable is introduced. The main purpose 
of using a random variable is so that we can define certain probability functions 
that make it both convenient and easy to compute the probabilities of various 
events. 


2.2 Random Variables 


A. Definitions: 


Consider a random experiment with sample space S. A random variable X(€) is a 
single-valued real function that assigns a real number called the value of X(¢) to 
each sample point ¢ of S. Often, we use a single letter _X for this function in place 
of X(¢) and use r.v. to denote the random variable. 

Note that the terminology used here is traditional. Clearly a random variable is 
not a variable at all in the usual sense, and it is a function. 

The sample space S is termed the domain of the r.v._X, and the collection of all 
numbers [values of X(¢)] is termed the range of the r.v. X. Thus, the range of X is a 
certain subset of the set of all real numbers (Fig. 2-1). 


X(0) 


Fig. 2-1 Random variable X as a function. 


Note that two or more different sample points might give the same value of 


X(¢), but two different numbers in the range cannot be assigned to the same sample 
point. 


EXAMPLE 2.1 In the experiment of tossing a coin once (Example 1.1), we might 
define the r.v. X as (Fig. 2-2) 


0 4 


Fig. 2-2 One random variable associated with coin tossing. 
X(H) = 1 X(T) = 0 
Note that we could also define another r.v., say Y or Z, with 
Y(A)=0,Y7)=1 or Z(H) =0,2(T) = 0 


B. Events Defined by Random Variables: 


If X is ar.v. and x is a fixed real number, we can define the event (X¥ = x) as 


(X = x) = {C: X(C) = 4} (2.1) 
Similarly, for fixed numbers x, x,, and x7, we can define the following events: 
(X= x= {0: XH = x} 
ci X(G) > x} (2.2) 
iG: X, = AC) S LF 
These events have probabilities that are denoted by 
PX — x) — PEE: X(E) -— 4} 
P(X = x) = P&C: XC) = x} (2.3) 
P(X > x) = P{E: XE) > 1} 
Pix, < XS x) = PG 2s < AC) Ss 4} 


EXAMPLE 2.2 In the experiment of tossing a fair coin three times (Prob. 1.1), the 
sample space S, consists of eight equally likely sample points S, = {HHH, ..., 


TTT}. If X is the r.v. giving the number of heads obtained, find (a) P(X = 2); (b) 
P(X < 2). 


(a) Let A C S, be the event defined by X = 2. Then, from Prob. 1.1, we have 
A = (X = 2) = {C: X(C) = 2} = {HHT, HTH, THH} 
Since the sample points are equally likely, we have 


P(X =2)= P(A)== 


(b) Let B c S, be the event defined by X < 2. Then 
B=(X<2=f{EX(E)< 2}={ MTT. THT, TTH, TTT } 


und P(X <2)— P(B) -< = 


a 


mle 


2.3 Distribution Functions 


A. Definition: 


The distribution function [or cumulative distribution function (cdf)] of X is the 
function defined by 


F, (0) = P(X =x) —w <x <0 (2.4) 


Most of the information about a random experiment described by the r.v. X is 
determined by the behavior of F’y(x). 


B. Properties of F'y(x): 


Several properties of F(x) follow directly from its definition (2.4). 


L Garis | (5) 
rea iene x, 3, (2.6) 
1 linn FG Bel (2.71 
4. Im Fi - F-4]-0 (283 
3; lin Filey = Flt y= Fa) 1 = : dal ate 29) 


Property 1 follows because F’\(x) is a probability. Property 2 shows that F’y(x) is 


a nondecreasing function (Prob. 2.5). Properties 3 and 4 follow from Eqs. (1.22) 
and (1.26): 


lim.P(X = x)= P(XS ©) = P(S)=1 


xo 
lim P(X S x)= P(X S-%)= P(O)=0 
x0 
Property 5 indicates that F(x) is continuous on the right. This is the 
consequence of the definition (2.4). 


EXAMPLE 2.3 Consider the r.v. X defined in Example 2.2. Find and sketch the 
cdf Fy(x) of X. 

Table 2-1 gives Fy(x) = P(X < x) for x = —1, 0, 1, 2, 3, 4. Since the value of X 
must be an integer, the value of F(x) for noninteger values of x must be the same 
as the value of F(x) for the nearest smaller integer value of x. The F(x) is 
sketched in Fig. 2-3. Note that F(x) has jumps at x = 0, 1, 2, 3, and that at each 
jump the upper value is the correct value for F(x). 


TABLE 2-1 


x (X = x) P Ax) 


—i 2) 0 
0 {TTT} 1 
{TTT. TTH, THT, HTT} 4a! 
2 {(7T, 1TH, THT. HTT, HHT,HTH,THH} } ; 
3 5 
4 5 

Fy) 


Fig. 2-3 


C. Determination of Probabilities from the Distribution Function: 


From definition (2.4), we can compute other probabilities, such as P(a < X< 5), 
P(X > a), and P(X < b) (Prob. 2.6): 


Me < X= by) = Fb) — F(a) (2.10) 
PO a (2.11) 
P(X bI— FLL ) b = lim 6-¢ (2.12) 

been 


2.4 Discrete Random Variables and Probability Mass Functions 


A. Definition: 


Let X be ar.v. with cdf Fy(x). If F(x) changes values only in jumps (at most a 
countable number of them) and is constant between jumps—that is, F(x) is a 


staircase function (see Fig. 2-3)—then_X is called a discrete random variable. 
Alternatively, X is a discrete r.v. only if its range contains a finite or countably 
infinite number of points. The r.v. X in Example 2.3 is an example of a discrete r.v. 


B. Probability Mass Functions: 


Suppose that the jumps in F',(x) of a discrete r.v. X occur at the points x), x, ... 


bd 


where the sequence may be either finite or countably infinite, and we assume x, < 
x; ifi <j. 
Then 
Fay FG = Pk Sa Pe Se Pi A ad (2.13) 
Let 
PY) = PL = 3) (2.14) 


The function p(x) is called the probability mass function (pmf) of the discrete 
rv. X. 


Properties of py(x): 


I, 

PEPE wots, (2.153 
5. | | 

pix i IP eat lt 1s ud (2.161 
3. 

¥ aylay| = (2.17 


The cdf F (x) of a discrete r.v. X can be obtained by 


fy (x)= P(X = x) = ¥ Dy (X,) (2.18) 


Vp = x 


2.5 Continuous Random Variables and Probability Density Functions 


A. Definition: 


Let X be ar.v. with cdf F(x). If F(x) 1s continuous and also has a derivative 
dF (x)/dx which exists everywhere except at possibly a finite number of points and 


is piecewise continuous, then_X is called a continuous random variable. 
Alternatively, X is a continuous r.v. only if its range contains an interval (either 
finite or infinite) of real numbers. Thus, if X is a continuous r.v., then (Prob. 2.18) 


PX = x)= 0 (2.19) 


Note that this is an example of an event with probability 0 that is not necessarily 
the impossible event @. 
In most applications, the r.v. is either discrete or continuous. But if the cdf Fy(x) 


of ar.v. X possesses features of both discrete and continuous r.v.’s, then the r.v. X is 
called the mixed r.v. (Prob. 2.10). 


B. Probability Density Functions: 


Let 
ego (2.20) 
dx 
The function f(x) is called the probability density function (pdf) of the 
continuous r.v. X. 
Properties of f y(x): 
1. 
flee o P21) 
2. 
[fifopdr =I 2.29) 
3. f (x) 1s piecewise continuous. 
4. 


(ae Aba ar Ragas (2.23) 
wo 


The cdf F(x) of a continuous r.v. X can be obtained by 
fy (x) = P(X ~ x) =f fyl5) dé (2.24) 


By Eq. (2.19), if X is a continuous r.v., then 
Past X=bh-PMasxX=h-MasX<ab- Mae X<hy 
Bo oe: | (2.25) 
{ Jyixtdae Fy (hy — Byte 
2.6 Mean and Variance 


A. Mean: 


The mean (or expected value) of a r.v. X, denoted by fy or E(X), is defined by 


by Py lk.) X: discrete 
ity = E(X)=4 * (2.26) 
| f a “fy (x) de X: continuous 


B. Moment: 
The nth moment of ar.v. X 1s defined by 


2 Xp" Py (Xj) X: discrete 
E(X")=5 * (2,27) 


f ~_ x" fy(x)dy —X: continuous 
Note that the mean of X is the first moment of X. 


C. Variance: 


The variance of a tr.v. X, denoted by OF or Var(X), is defined by 
ao} — VurX) — EX - EXP} (2.28) 


Thus, 


2 — My 8 Py(X,) X: discrete 
r= (2.29) 
fr. (4 — Uy y fy(ydy  X: continuons 
Note from definition (2.28) that 
far(X) = 0 (2.30) 


The standard deviation of a tr.v. X, denoted by oy, is the positive square root of 


Var(X). 
Expanding the right-hand side of Eq. (2.28), we can obtain the following 
relation: 


Var(X) — F(X?) — [FOP (2.31) 


which is a useful formula for determining the variance. 


2.7 Some Special Distributions 


In this section we present some important special distributions. 


A. Bernoulli Distribution: 


A rv. X is called a Bernoulli r.v. with parameter p if its pmf is given by 
pth) = PX = k= pt — py * c=D,.1 (2.32) 
where 0 < p < 1. By Eq. (2.18), the cdf F(x) of the Bernoulli rv. X is given by 


|” x<.0 
p 0Sx<1 (2.33) 


r x21 


Fig. 2-4 illustrates a Bernoulli distribution. 


p yt } 


Fig. 2-4 Bernoulli distribution. 


The mean and variance of the Bernoulli r.v. X are 
Ay= EAS FP (2.34) 
oy = Var(X) = p(| — p) (2.35) 


A Bernoulli r.v. X is associated with some experiment where an outcome can be 
classified as either a “success” or a “failure,” and the probability of a success is p 
and the probability of a failure is 1 — p. Such experiments are often called 
Bernoulli trials (Prob. 1.71). 


B. Binomial Distribution: 


A rv. X is called a binomial r.v. with parameters (n, p) if its pmf is given by 


‘ 4 


pytK= P(X =) =| : etl—py k=O, cn (2.36) 


iy i 


where 0 <p <1 and 


eee, 


which is known as the binomial coefficient. The corresponding cdf of X is 


ow ee 
F,Q)= 2 | ; pr(l py" nssxcn ll (2.37) 
ee aa 


Fig. 2-5 illustrates the binomial distribution for n = 6 and p = 0.6. 


Fig. 2-5 Binomial distribution with 1 = 6, p = 0.6. 


The mean and variance of the binomial r.v. X are (Prob. 2.28) 


ly = E(X) = np (2.38) 
oy = VatX) = ap(l — p) (2.39) 


A binomial r.v. X is associated with some experiments in which n independent 
Bernoulli trials are performed and_X represents the number of successes that occur 
in the n trials. Note that a Bernoulli r.v. is just a binomial r.v. with parameters (1, 


P). 


C. Geometric Distribution: 


Atv. X is called a geometric r.v. with parameter p if its pmf is given by 
PAX) = PX = x) = (1 - py ln ce a (2.40) 
where 0 <p < 1. The cdf F(x) of the geometric r.v. X is given by 


Faye Pes wy= 1 — tl — pF HN IE, 41) 


Fig. 2-6 illustrates the geometric distribution with p = 0.25. 
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Fig. 2-6 Geometric distribution with p = 0.25. 


The mean and variance of the geometric r.v. _X are (Probs. 2.29 and 4.55) 


1 
p 


Oo; wary 2 (2.43) 


% 


~P 


A geometric r.v. X is associated with some experiments in which a sequence of 
Bernoulli trials with probability p of success is obtained. The sequence is observed 
until the first success occurs. The r.v. X denotes the trial number on which the first 
success occurs. 

Memoryless property of the geometric distribution: 

If X is a geometric r.v., then it has the following significant property (Probs. 

22D) 


AX >i | [X>D=PxX>pP jel (2.44) 


Equation (2.44) indicates that suppose after 7 flips of a coin, no “head" has 
turned up yet, then the probability for no “head" to turn up for the next 7 flips of 
the coin is exactly the same as the probability for no “head" to turn up for the first 
i flips of the coin. 

Equation (2.44) is known as the memoryless property. Note that memoryless 
property Eq. (2.44) is only valid when i, 7 are integers. The geometric distribution 
is the only discrete distribution that possesses this property. 


D. Negative Binomial Distribution: 


A rv. X is called a negative binomial t.v. with parameters p and k if its pmf is 
given by 


Pel x)= Pix =1)-( - |p*c = py K=kE+1,,.. (2.45) 
Li so ! 


where 0<p< 1. 
Fig. 2-7 illustrates the negative binomial distribution for 4 = 2 and p = 0.25. 
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Fig. 2-7 Negative binomial distribution with k = 2 and p = 0.25. 


The mean and variance of the negative binomial r.v. X are (Probs. 2.80, 4.56). 


My — £(X)— z (2.46) 
p 
o2 =Var(xy= 4G P) (2.47) 
Pp 
A negative binomial r.v._X is associated with sequence of independent Bernoulli 
trials with the probability of success p, and_X represents the number of trials until 
the kth success is obtained. In the experiment of flipping a coin, if X = x, then it 
must be true that there were exactly 4 — 1 heads thrown in the first x — 1 flippings, 
ae | 
k=1 
sequences of length x with these properties, and each of them is assigned the same 
probability of p*! (1 — py“. 
Note that when k = 1, X is a geometrical r.v. A negative binomial r.v. is 
sometimes called a Pascal tr.v. 


and a head must have been thrown on the xth flipping. There are 


E. Poisson Distribution: 


Ar.v. X is called a Poisson r.v. with parameter (> 0) if its pmf is given by 


k 
Py (k) = Pt Kenae tT k=0,1,... (248) 
The corresponding cdf of X is 
aw At =. 
fy (x4) —-e€ ' 7 h=x<aA+1 (2.49) 


Fig. 2-8 illustrates the Poisson distribution for 4 = 3. 
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d 


Deed Od 


Fig. 2-8 Poisson distribution with A = 3. 


The mean and variance of the Poisson r.v. X are (Prob. 2.31) 
a, = E(XX)=A (2.50) 
= Var(X) = A (2.51) 
The Poisson r.v. has a tremendous range of applications in diverse areas 
because it may be used as an approximation for a binomial r.v. with parameters (n, 
p) when n is large and p is small enough so that np is of a moderate size (Prob. 


2.43). 
Some examples of Poisson r.v.’s include 


1. The number of telephone calls arriving at a switching center during various 
intervals of time 


2. The number of misprints on a page of a book 
3. The number of customers entering a bank during various intervals of time 


F. Discrete Uniform Distribution: 


A rv. X is called a discrete uniform t.v. if its pmf is given by 


- 1 , 
Py (x) = P(X Sx) = - [==|x|=7 (2.452) 


H 


The cdf F(x) of the discrete uniform r.v. X is given by 


0 O<x< 1 
[*] | 
Fy (4) = P(X S x) ={— lI<ax<n (2.53) 
% n 
1 n=x 


where |x| denotes the integer less than or equal to x. 
Fig. 2-9 illustrates the discrete uniform distribution for n = 6. 
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Fig. 2-9 Discrete uniform distribution with n = 6. 


The mean and variance of the discrete uniform r.v. X are (Prob. 2.32) 


l , 
ty = E(X)= , (n+) (2,54) 


ao — Var(X) Si =) (2.55) 


The discrete uniform r.v. X is associated with cases where all finite outcomes of 
an experiment are equally likely. If the sample space is a countably infinite set, 
such as the set of positive integers, then it is not possible to define a discrete 
uniform r.v. X. If the sample space is an uncountable set with finite length such as 
the interval (a, b), then a continuous uniform r.v. X will be utilized. 


G. Continuous Uniform Distribution: 


A rv. X is called a continuous uniform tr.v. over (a, b) if its pdf is given by 


ye 

a<xx<b 

Fy(x) = b-a (2.56) 
| 0 otherwise 


The corresponding cdf of X is 


() x=a 
e—a 
Fy (x) =+4 axx<b (2.57) 
b-—a 
] x= bh 
Fig. 2-10 illustrates a continuous uniform distribution. 
[at F deh 
1 
1 
a | 
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o a by i i a fh x 
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Fig. 2-10 Continuous uniform distribution over (a, b). 


The mean and variance of the uniform r.v. X are (Prob. 2.34) 


a | 


5 
& 


iy =E(X)= (2.58) 


(i= ay 
12 


oy = Var(X) = (2.59) 


a 


A uniform r.v. X is often used where we have no prior knowledge of the actual 
pdf and all continuous values in some range seem equally likely (Prob. 2.75). 


H. Exponential Distribution: 


A rv. X is called an exponential r.v. with parameter X(> 0) if its pdf is given by 
(2.60) 


which is sketched in Fig. 2-11(a). The corresponding cdf of X is 


Kh 
Faia 


le] | 


Fig. 2-11 Exponential distribution. 
._jime™ x=0 | 
Fy (x)= ‘ (2.61) 


which is sketched in Fig. 2-11(d). 
The mean and variance of the exponential r.v. X are (Prob. 2.35) 


Hy = E(X)= : (2.62) 


l 


42 


a= Valx)= 


Memoryless property of the exponential distribution: 


If X is an exponential r.v., then it has the following interesting memoryless 
property (cf. Eq. (2.44)) (Prob. 2.58) 


P(X si |X > sp = PIX yf 2) (2.64) 
Equation (2.64) indicates that if X represents the lifetime of an item, then the item 
that has been in use for some time is as good as a new item with respect to the 
amount of time remaining until the item fails. The exponential distribution is the 
only continuous distribution that possesses this property. This memoryless 


property is a fundamental property of the exponential distribution and is basic for 
the theory of Markov processes (see Sec. 5.5). 


I. Gamma Distribution: 


Atv. X is called a gamma r.v. with parameter (a, A) (a > 0 and A > 0) if its pdf is 
given by 


ae “Oxy * 
fy() =} Tia) _ (2.65) 
0 x<.0 


where I'(a@) is the gamma function defined by 
ia)= ff, . etx! dy a> (2.66) 
and it satisfies the following recursive formula (Prob. 2.26) 
Viatb=alte) a=~0 (2.67) 


The pdf f(x) with (a, 4) = (1, 1), (2, 1), and (5, 2) are plotted in Fig. 2-12. 


Hl 1 a 4 q A 


Fig. 2-12 Gamma distributions for selected values of a and 2 


The mean and variance of the gamma r.v. are (Prob. 4.65) 


a 
uy = E(X)=— 2.68) 
x ry ( 
of = Var(X) = (2.69) 
AX 


Note that when a = 1, the gamma r.v. becomes an exponential r.v. with parameter i 
[Eq. (2.60)], and when a = n/2, 4 = 1/2, the gamma pdf becomes 


Cie Safes 
P(n/2} 


Sx) (2.70) 


which is the chi-squared tr.v. pdf with n degrees of freedom (Prob. 4.40). When a = 
n (integer), the gamma distribution is sometimes known as the Erlang distribution. 


J. Normal (or Gaussian) Distribution: 


Atv. X is called a normal (or Gaussian) t.v. if its pdf is given by 


e ft - ie" 1 


The corresponding cdf of X is 


rise l fbi Paty a | a oY 2 
fy (x)—— f gate ‘qe = = f ene ae 
Jamcre7™ laa 


This integral cannot be evaluated in a closed form and must be evaluated 
numerically. It is convenient to use the function ®(z), defined as 


3 
—579 


I : 
D(z) = = e 7 de 


to help us to evaluate the value of F(x). Then Eq. (2.72) can be written as 


Fy (x) =@ [os m.. 


O 


Note that 


cb(—z) = 1 — D(z) 


The function ®(z) is tabulated in Table A (Appendix A). Fig. 2-13 illustrates a 


normal distribution. 


Fig. 2-13 Normal distribution. 


The mean and variance of the normal r.v. X are (Prob. 2.36) 
ly — E(X) = (2.76) 


Oy = Var(X) = o? (297) 


We shall use the notation N(w; 0”) to denote that X is normal with mean 1 and 
variance o*. A normal r.v. Z with zero mean and unit variance—that is, Z = N(0; 1) 
—is called a standard normal t.v. Note that the cdf of the standard normal r.v. is 
given by Eq. (2.73). The normal r.v. is probably the most important type of 
continuous r.v. It has played a significant role in the study of random phenomena 
in nature. Many naturally occurring random phenomena are approximately normal. 
Another reason for the importance of the normal r.v. is a remarkable theorem 
called the central limit theorem. This theorem states that the sum of a large number 
of independent r.v.’s, under certain conditions, can be approximated by a normal 
r.v. (see Sec. 4.8C). 


2.8 Conditional Distributions 


In Sec. 1.6 the conditional probability of an event A given event B is defined as 


P(AN B) 


P(A|B)= P(B)>0 


The conditional cdf F(x | B) of ar.v. X given event B is defined by 


PUX=EXNB 
pyar URS EE (2.78) 


F,(x|B)=P(Xsx 
P(B) 


The conditional cdf F(x | B) has the same properties as F(x). (See Prob. 1.43 and 
Sec. 2.3.) In particular, 


Fy(-~|B)-0 = Fy | B= 1 (2.79) 
Pa<X=b|B)=F(b| &)- Ffa| B) (2.80) 


If X is a discrete r.v., then the conditional pmf p(x; | B) is defined by 


P{(X =x,)0 BY 


py (x, |B) — P(X — x,|B)= (2.81) 
x | | P(B) 
If X is a continuous r.v., then the conditional pdf f(x | B) is defined by 
IFy («|B ree 
Pele x (|B) (2.82) 
ax 


SOLVED PROBLEMS 


Random Variables 


2.1. Consider the experiment of throwing a fair die. Let_X be the r.v. which 
assigns | if the number that appears is even and 0 if the number that appears 
is odd. 


(a) What is the range of X? 
(b) Find P(X = 1) and P(X = 0). 


The sample space S on which_X is defined consists of 6 points which are 
equally likely: 


S = {1,2,3, 4,5, 6} 


(a) The range of Xis Ry= {0, 1}. 
ae ao 3.1 62% = 
(b) (X= 1)= {2, 4, 6}. Thus, P(X = 1) = a5 Similarly, (X¥ = 0) = {1, 3, 
5}, and P(X = 0) =! 


2 


2.2. Consider the experiment of tossing a coin three times (Prob. 1.1). Let X be 
the r.v. giving the number of heads obtained. We assume that the tosses are 
independent and the probability of a head is p. 

(a) What is the range of X? 
(6) Find the probabilities P(X = 0), P(X = 1), P(X = 2), and P(X = 3). 


The sample space S on which_X is defined consists of eight sample points 
(Prob. 1.1): 


Dds 


S = {MAA, HAT, 35 TTT} 


(a) The range of Xis Ry= {0, 1, 2, 3}. 
(b) If P(A) =p, then P(T) = 1 — p. Since the tosses are independent, we 
have 
P(X = 0) = PHTTT}| =(1 — py 
P(X =1) =PKATT}) + PUTHT}| + PHT TH} = 31 — pp 
P(X =2) = PI{HHT}] + PI{HTH}] + PI{THA}| = 3(1 — pp? 
P(X = 3) = PH HAA} = p3 
An information source generates symbols at random from a four-letter 
alphabet {a, b, c, d} with probabilities P(a) = f P(b) = ; and 


P(c) = Pd) = 7 A coding scheme encodes these symbols into binary 
codes as follows: 


a 0 

b 10 
Cc 110 
d 111 


Let X be the r.v. denoting the length of the code—that is, the number of 
binary symbols (bits). 

(a) What is the range of X? 

(b) Assuming that the generations of symbols are independent, find the 


probabilities P(X = 1), P(X = 2), P(X = 3), and P(X > 3). 
(a) The range of Xis Ry= {1, 2, 3}. 
PUA = I) — Pita = Pie) = 
Pin = 2) = PRBS] = PO) = 
P(X = 3) = Pl{c, d}] = P(c) + P(d) = 3 
PU 23) = Pe) = 0 


fle ble 


(5) 


2.4. Consider the experiment of throwing a dart onto a circular plate with unit 
radius. Let_X be the r.v. representing the distance of the point where the dart 
lands from the origin of the plate. Assume that the dart always lands on the 
plate and that the dart is equally likely to land anywhere on the plate. 

(a) What is the range of X? 


(b) Find (i) P(X <a) and (ii) P(a < X <b), wherea<b <1. 


(a) The range of Xis Ry= {x:0<x< 1}. 


(b) (1) (X< a) denotes that the point is inside the circle of radius a. Since 


the dart is equally likely to fall anywhere on the plate, we have (Fig. 2- 
14) 


2 
TA 
——=a’,0<a<l. 


P(X <a)= 


(11) (a < X < 5) denotes the event that the point is inside the annular ring with 
inner radius a and outer radius b. Thus, from Fig. 2-14, we have 


m(b* = a’) 


5 =h? -a’ 0=a<b<1., 
ml” 


Plax X<b)= 


Distribution Function 


2.5. Verify Eq. (2.6). 
Let x, < xy. Then (X < x1) is a subset of (X < x); that is, (¥ <x.) C (X <x,). Then, 
by Eq. (1.41), we have 


gO, ess Es Ea a ee or FLG Fy) 


2.6. Verify (a) Eq. (2.10); (b) Eq. (2.11); (c) Eq. (2.12). 


Dols 


(a) Since (X< b) =(X <a) U (a< X <b) and (X<a)N (a<X<b)=9, we 


have 

P(X = b) = P(X=a)+ Plax=X=b) 
“a Fb) = Fifa) + Pla X = b) 
Thus, Pa < X = b) = F(b) — Fifa) 


(b) Since (X< a) U (X> a) =S and (X < a) N (X> a) = @, we have 
P(X = a) + P(X > a) = P(S)=1 


Thus, MX > a)=1—P(XSa)=1— F fa) 
(c) Now 


P(X <6)=Pllim X=4—e]= lim P(X =b—e) 
e—0 20 


e>0 e>0 
=lim Fy(h—€)= Fy (b ) 
£0 
Show that 
(a) 
Aye Xb) = PW = ay > EL Fla) (2.83) 
(5) 
Pao XB) ABI Byla) — PIX Bb (tad) 


(c) 


Aa 2 Xb) AX a) + Pyth) — Bia) — AX ~ (2.83) 
(a) Using Eqs. (1.37) and (2.10), we have 


PiasxX Sb)= P(X =a)U(a< X Sb) 
= P(X =a)+ P(a<X=b) 
= AX Saye 1b) - 2a) 

(b) We have 


PlaxXsSb)=Pila<X<b)U(X =6)| 
= Pia<X<b)+ P(X =b) 


Again using Eq. (2.10), we obtain 
Pax X <b) = P(a<X =b)— P(X =b) 
= F,(b) — Fa) — P(X = b) 
(c) Similarly, 
PiasxXsb)=P\lasx<b)U(X=b))| 
= Pia=X<b)+ P(X = b) 
Using Eq. (2.83), we obtain 


Pia=X< b) = Pia=X=hb) - P(X =b) 
=X —a) + FMD) =F fay - Fk ®) 


2.8. Let.X be the r.v. defined in Prob. 2.3. 
(a) Sketch the cdf F\(x) of X and specify the type of X. 


(b) Find (i) P(X < 1), (ii) PL < X <2), (iii) P(X> 1), and (iv) P< X <2). 


(a) From the result of Prob. 2.3 and Eq. (2.18), we have 
which is sketched in Fig. 2-15. The r.v. X is a discrete r.v. 


0 x<l 
i l<=x<2 
Fy (x)= P(X S x)= 
3 2=x<3 
4 
l x=3 


Fig. 2-15 


(b) (1) We see that 


P&S 1) = Fy) = 5 
(ii) By Eq. (2.10), 
ee es oe | 
PO <XS 2) = Fy) — FAA) 4 2 4 
(iii) By Eq. (2.11), 
PX> 1) =1- Fy) =1-5 =5 
x": 2 2@Z 
(iv) By Eq. (2.83), 
a = 1) is ee E =! 2 -5 =3 
See se eee ee a 


2.9. Sketch the cdf F(x) of the r.v. X defined in Prob. 2.4 and specify the type of 
X, 


From the result of Prob. 2.4, we have 


0 x<0O 
Fy (x) = P(X Sx) = 4x7 O=x<1 
| l=x 


which is sketched in Fig. 2-16. The r.v. X is a continuous r.v. 


F(x) 


) 1 X 


Fig. 2-16 


2.10. Consider the function given by 


0 x<0 

F(x) =4 x+— O0= c<s 
| gu! 
2 


(a) Sketch F(x) and show that F(x) has the properties of a cdf discussed in 
Sec. 2.3B. 


(b) If X is the r.v. whose cdf is given by F(x), find (4) P(X S p» (ii) 
PO<X< Ds (iii) P(X = 0), and (iv) PO = X < 2. 
(c) Specify the type of X. 


(a) The function F(x) is sketched in Fig. 2-17. From Fig. 2-17, we see that 0 
< F(x) < 1 and F(x) is a nondecreasing function, F(—o) = 0, F(«) = 1, 
F(O) = > and F(x) is continuous on the right. Thus, F(x) satisfies all the 
properties [Eqs. (2.5) to (2.9)] required of a cdf. 


Fu) 
4 
4 
2 
a 1 1 x 
2 
Fig. 2-17 
(b) (1) We have 
4 4 4 2 4 
(ii) By Eq. (2.10), 
P o<xet]=r+ ee 
4, 4, 4 2 4 


(iii) By Eq. (2.12), 


P(X =0)= P(X =0)— P(X <0)= F(O)-— FO )=——-O0= 


Nl re 
oe 


2.11. 


(iv) By Eq. (2.83), 
Plo =x) = P(X -0)+7{|- FO 
4 4 


(c) The rv. X is a mixed rv. 


Find the values of constants a and b such that 
1 —_ —x/b > 
Ros ; ae x=0 


is a valid cdf. 


To satisfy property 1 of Fy(x) [0 < F(x) < 1], we must have 0<a< 1 and b 
> 0. Since b > 0, property 3 of F(x) [Fy(%) = 1] is satisfied. It is seen that 
property 4 of F(x) [Fy (-) = 0] is also satisfied. For 0 <a< 1 and b> 0, 
F(x) is sketched in Fig. 2-18. From Fig. 2-18, we see that F(x) is a 
nondecreasing function and continuous on the right, and properties 2 and 5 
of F\(x) are satisfied. Hence, we conclude that F(x) given is a valid cdf if 0 


<a<1andb>0. Note that if a = 0, then the r.v._X is a discrete r.v.; if a = 1, 
then_X is a continuous r.v.; and if 0 < a < 1, then_X is a mixed r.v. 


Fx) 


Fig. 2-18 


Discrete Random Variables and Pmf’s 


2.12. Suppose a discrete r.v. X has the following pmfs: 
pe(!) = py(2) =! py(3) =! py(4) =! 


(a) Find and sketch the cdf F(x) of the r.v. X. 
(b) Find (i) P(X < 1), (ii) PO. < X< 3), (iti) PO < X< 3). 


(a) By Eq. (2.18), we obtain 


0 | 

4 l=x<2 

2 
Fy(x)=P(XSx)= : 2 =.5%.5 

U 3=x<4 

8 

I x24 


which is sketched in Fig. 2-19. 


F Ax) 


Fig. 2-19 
(b) (1) By Eq. (2.12), we see that 
P(X <1) =F,(1-) =0 


(ii) By Eq. (2.10), 


PU<XS3)=F3)~ Fe =—— 


(iii) By Eq. (2.83), 


_ A ee joe oe Lo ED 
POS X S=3)= P(X =1) + Fy (3) Ra s+ 5 


2.13. (a) Verify that the function p(x) defined by 


. I x=0,1,2,... 
P(x)=4 4\ 4 


0 otherwise 


is a pmf of a discrete r.v. X. 
(b) Find (i) P(X = 2), (11) P(X < 2), (iti) P(X = 1). 


(a) It is clear that 0 < p(x) < 1 and 
< 3u[ 1 3 1 
4 
Thus, p(x) satisfies all properties of the pmf [Eqs. (2.15) to (2.17)] ofa 


discrete r.v. X. 
(b) (i) By definition (2.14), 


2 
pax=2)=9)=3{4| =2 


4 


(ii) By Eq. (2.18), 


(iii) By Eq. (1.39), 


3 
P(X =1)=1- P(X =0)=1— p(O)=1- 2 = 


2.14. Consider the experiment of tossing an honest coin repeatedly (Prob. 1.41). 
Let the r.v. X denote the number of tosses required until the first head 
appears. 


(a) Find and sketch the pmf p(x) and the cdf Fy(x) of X. 

(b) Find G) PU < xX <4), (ii) P(X> 4). 

(a) From the result of Prob. 1.41, the pmf of X is given by 
F i Vk 

pxls)= rik P= B)=[ | k=1,2,... 


2 


yes 


Then by Eq. (2.18), 


|x || \k 
Fy) Px=9= J ryt ¥(3| 


k=1 k=l 


where |x| is the integer part of x or 


Fy) = 


0 x<il 

= l=x<2 
2 

a 2=¥<3 
4 


it 
-| nsx=x<ntl 


These functions are sketched in Fig. 2-20. 


(5) (i) By Eq. (2.10), 


Pa ea Faye = 


(ii) By Eq. (1.39), 


P(X> 4)=1—P(x $4)=1-F,(4)=1-b=4 
16 16 


2.15 Let X be a binomial r.v. with parameters (n, p). 
(a) Show that p(x) given by Eq. (2.36) satisfies Eq. (2.17). 
(b) Find P(X> 1) ifn =6 and p=0.1. 


(a) Recall that the binomial expansion formula is given by 


er ee 
(a+by" = ¥ L, Jaton 
k=) ? 


Thus, by Eq. (2.36), 


n 


= n ky a-k n n 
Py (k= | | (l— py" * =(p+1—p)y =1l" =1 
ZrO" 2k 


k=0 \ 


(b) Now 


(2.86) 


P(X >1)=1— P(X =0)— P(X =1) 


wr oe 
0 


=1-(0.9)° —6(.1)0.9P =0.114 


(0.1)°(0.99° -( ° jo. (0.9)° 


2.16. Let_X be a geometric r.v. with parameter p. 
(a) Show that p)(x) given by Eq. (2.40) satisfies Eq. (2.17). 
(b) Find the cdf Fy{(x) of X. 


(a) Recall that for a geometric series, the sum is given by 


Thus, 


dP = J rO= S0-pl"p ‘ a a 


(I-—p) p 
(b) Using Eq. (2.87), we obtain 
PIX» k= S (l- p} po Pe a 1- pi} (2.28) 
i=n.-1 = py 
Thus, 
PX=AzH=1-P(X>H=1-U1-py (2.89) 
and 
Fi PIX x) 1 Cl pF a oe (2.91) 


Note that the r.v. X of Prob. 2.14 is the geometric r.v. with p = 4 


2.17. Verify the memoryless property (Eq. 2.44) for the geometric distribution. 


P(X>itj|X>D)=PXK>_/ j= 1 


From Eq. (2.41) and using Eq. (2.11), we obtain 


PX 1 PKS Cl py for ez | (2.915 


Now, by definition of conditional probability, Eq. (1.55), we obtain 
PIX>i+j|K>j—l eee Aik a) STORE| 
P(X >i) 
_P(X>i+)j) 
P(X >i) 
=s i+] ? 
=p) =(l-p¥=P(X>j) i,j 
(Il— p) 


2.18. Let X be a negative binomial r.v. with parameters p and k. Show that p,(x) 
given by Eq. (2.45) satisfies Eq. (2.17) for 4 = 2. 


From Eq. (2.45) 


pxts=| fol eta-e x=k,k+1, ... 


Let A=2 and 1 —p=q. Then 


Prose peo J y-) eR 
pelrh=| . pty 7 =te—Dpiy" e=RG (2,92) 

\ ] ! 
S eclth= 3 (a-b prg" = p+ 2p gt apg’ - 4h + (2,93) 
a-i a-i 
Now let 

S=p?+2pqt3pqgt4apg+- 

Then 


gS=pqt+2pert3pPaPtapagt-- 


Subtracting the second series from the first series, we obtain 


2.19. 


(1—q)S=p? + p’q+ p’q’ + p’q? ++ 


=p(I+qtq’ tq t--)=p? , 
~ | 


and we have 


2 oe 
(i-qy  p 
Thus, 
¥ px =! 
x=2 
Let X be a Poisson r.v. with parameter i. 


(a) Show that p(x) given by Eq. (2.48) satisfies Eq. (2.17). 
(b) Find P(X > 2) with 4 = 4. 


(a) By Eq. (2.48), 


4k 
| 
Px(k)= a 
and 
P(X =2)= > Py(k)=e ayy +4+8)+0.238 
k=0 
Thus, 


P(X > 2) = 1— P(X = 2) = 1 — 0.238 =0.762 


Continuous Random Variables and Pdf’s 
2.20. Verify Eq. (2.19). 
From Eqs. (1.41) and (2.10), we have 
P(X = x) = P(x —eE<X Sx) = Fy) — Fyx— &) 


for any € > 0. As F(x) is continuous, the right-hand side of the above 
expression approaches 0 as € — 0. 


Thus, P(X =x) =0. 


2.21. The pdf of a continuous r.v._X is given by 


0<x<l 


Jx(%)= l<x<2 


CSCwln wle 


otherwise 
Find the corresponding cdf F(x) and sketch f(x) and F'y{x). 


By Eq. (2.24), the cdf of X is given by 


(0) x<0 
[i sag=2 O=x<1 
03 3 
by (x) 1 2 2 
xt fisag+ dg==x- = ls=x<2 
bs =) 
1] z 2 
ae OA —— Di <a 
Lote, as ae 


The functions f(x) and F(x) are sketched in Fig. 2-21. 


FAM 


Fig. 2-21 
2.22. Let X be a continuous r.v. X with pdf 
fe (x) ke O=<7<1 
xX — 
. 0 otherwise 


where k is a constant. 
(a) Determine the value of k and sketch fy(x). 


(6) Find and sketch the corresponding cdf F'y(x). 
(c) Find PG Sy 29) 


(a) By Eq. (2.21), we must have k > 0, and by Eq. (2.22), 


Thus, k = 2 and 


2x O0<x<l 
0or=| 


0 otherwise 


which is sketched in Fig. 2-22(a). 


Fil 


ll 

2 
| 
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I 
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I 
] 

4 
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Fig. 2-22 
(b) By Eq. (2.24), the cdf of X is given by 


0) x<0 


Fy(x)=4 fo 26d=x*  OSx<I 


which is sketched in Fig. 2-22(b). 
(c) By Eq. (2.25), 


- | iy @ 
P|—<X <2|=F,(2)-F,|—|=1-|]—] =— 
; x2) (3) | 16 


\ 


2.23. Show that the pdf of a normal r.v. X given by Eq. (2.71) satisfies Eq. (2.22). 


From Eq. (2.71), 


pees _ | o (x—w)2 (202) 
ff) dy = ae fe dx 


2.24. 


Let y = (x — w/(V20). Then dy = V20dy and 


] cee eT. era 1 © an? 
f Pew Io Y hie i{ e” dy 
Jana * -* — 


Let 


Then 


‘7 =|f7e *as|| f ag dy| = f- Ae eo +y° dx dy 


Letting x =r cos @ and y = sr sin 6 (that is, using polar coordinates), we have 


27 pw zag 00 eg 
Pas fe La rdrd0=2xf e rdr=x 


Thus, 
1=f" * dy=Vx (2.94) 
and 
frie) oO 80 1 
Fy (x e° dy Vz =1 
J. x Jax J o Vax 
Consider a function 
I (—x? +x-a) 
PO)= re ils —a<c xc 
‘4 


Find the value of a such that f(x) is a pdf of a continuous r.v. X. 
fix) = A ge +x-a)_ 1 air —x+1/4+a-1/4) 
NE Va 


1 expound OND | cofiacal TA 
= e (x-1/2) e (a—1/4) 
Tt 


If f(x) 1s a pdf of a continuous r.v._X, then by Eq. (2.22), we must have 


* —(a-1/ 6] spec? 
fPf@de= OM PP ee OW de =I 


—0 


vif 
Now by Eq. (2.52), the pdf of |; | ]is --¢-"2". Thus, 
2’ 2) Va 
ae | -(x-1/27 pty E —(a-Ii4) 
ee ‘adx=1- and (x) dx =e =| 
a Joa 


from which we obtain a = ; 


2.25. Ar.v. X is called a Rayleigh r.v. if its pdf is given by 
[Het 9 


o (2.95) 


Fy Q= 
lo x=<Q) 


(a) Determine the corresponding cdf F(x). 
(b) Sketch fix) and F(x) for o = 1. 


(a) By Eq. (2.24), the cdf of X is 


x E _— £2752 
FyQ= fie FCP de xz=0 
Oo 


Let y = €7/(207). Then dy = (1/o7)édE, and 


Fea) i e re ae = 7 dy i 1 oe Nie ) ( > 96) 
(b) With o = 1, we have 


ioe xe? x>0 
(x)= 
0 x<O0 


= —x7/2 
F,(x) = l—-e x=0 
0 x<0 


These functions are sketched in Fig. 2-23. 


Fh F A) 
(6 1 
Ce 
O4 i 
C6 
02 Cd 
Ce 
t) I | " I | | 
iH 1 2 3 x C 1 Z i x 
idl] ui 


Fig. 2-23 Rayleigh distribution with o = 1. 


2.26. Consider a gamma r.v. X with parameter (a, 4) (a > 0 and A > 0) defined by 


Eq. (2.65). 
(a) Show that the gamma function has the following properties: 
1. 

Tie + l= alton “> (2.07) 
2 

Re +S el kG (i: integer (7.98) 
a 

rit|=vx (2.99) 


(6) Show that the pdf given by Eq. (2.65) satisfies Eq. (2.22). 


(a) Integrating Eq. (2.66) by parts (u = x*~!, dv = e~* dx), we obtain 


Meaj--e tx fe —1yx7? dy 
u is (2.100) 
pe A . 
=(t¢—- nf, etx"? dk =(a- P(e — 1) 


Replacing a by a+ 1 in Eq. (2.100), we get Eq. (2.97). 
Next, by applying Eq. (2.97) repeatedly using an integral value of a, 
say o« =k, we obtain 


Mk + 1) =k9(k) = kk — DIK - 1) = kk - 1) + (DTA) 
Since 
r(1) = a dx =1 
it follows that [(A + 1) =k!. Finally, by Eq. (2.66), 
L |) ¢?=x.-12 
sles ex dx 
Let y=x!?. Then dy = ; x!/? dx, and 


in view of Eq. (2.94). 
(b) Now 


a Te ge ee ee ae 
f AQ) ax = J, 7 an — ra) J, e x ax 


Let y = Ax. Then dy = Adx and 


= AS 0 cuss = F oe 
(x) dx = —— |. ey" dy= (a) =1 
I Ix T(aja* J Taya? 


Mean and Variance 


2.27. Consider a discrete r.v. X whose pmf is given by 


U x=-1,0,1 
Px (x)=43 
0 


otherwise 


(a) Plot p,(x) and find the mean and variance of X. 
(6b) Repeat (a) if the pmf is given by 


I x=—2,0,2 
Px (x) = 23 
0 


otherwise 
(a) The pmf p,(x) is plotted in Fig. 2-24(a). By Eq. (2.26), the mean of X is 


fa 


i 
| | 
3 q 
4 | tl 1 ? ¥ 2 1 0 1 a ft 
fal i 


Fig. 2-24 
[ly = BQ) 251 +0+1)=0 
By Eq. (2.29), the variance of X is 


on = Var(X) = E|(X — uy 7] =HX j= sl 1? Oy ot (iy j= 


to | bo 


(b) The pmf p,{x) is plotted in Fig. 2-24(5). Again by Eqs. (2.26) and 
(2.29), we obtain 


Ux = E(X) =3(-2 +042) =0 


of = VartxX) = 21-2" HOP HP == 


Note that the variance of X is a measure of the spread of a distribution about 
its mean. 


2.28. Let ar.v. X denote the outcome of throwing a fair die. Find the mean and 
variance of X. 


Since the die is fair, the pmf of X is 


| 
nN 
a 


T ' I 
Px(4)=pxl(kK)=— ik 
6 
By Eqs. (2.26) and (2.29), the mean and variance of X are 


y R= E= (42434445 +6)= 


’ "” S| = J ’ 7 | 


2 Se ee Se: 
oi =2{1-2| + /2-—] +/3-— -|4-—| 
. 2 | 2 2 ce” °F 


Alternatively, the variance of X can be found as follows: 


9 9 tal ‘ 9 ¢ 
EO?) == +27437 +4? 457 462)=— 
a) 


6 


Hence, by Eq. (2.31), 


of = E(X?)- [EQOP => — (=| =» 


2.29. Find the mean and variance of the geometric r.v. X defined by Eq. (2.40). 
To find the mean and variance of a geometric r.v. X, we need the following 


results about the sum of a geometric series and its first and second 
derivatives. Let 


a) 


” a 
g(r) = ¥ ar’ = —s rl 


a-o 
Then 
ddr) ~< a 
re 2 a S ane i ‘ 
dr re (l-r) 
dur) < Ju 
= es pec) 
g(r) =—_ = Santa = We" -=— 
dr- n 2 (1 cae 


(2.1011) 


(2.102) 


(2.103) 


By Eqs. (2.26) and (2.40), and letting g = 1 — p, the mean of X is given by 


— 
My — P(X) S uy" '» — P a= P = 
ral (t—¢_" pr 


where Eq. (2.102) is used with a = p and r= dq. 


To find the variance of X, we first find ELX(X — 1)]. Now, 


a 


ELXCX —D] = Ss (x71 gr! p= , pqxtx — Nqr 


x=1 xr=2 


_ 2*pq _“pq_2qg_2tl p) 
9 


q 


| gy pe » p 
where Eq. (2.103) is used with a = pg and r= q. 


Since ELX(X— 1)] = E(X? — X) = E(X) — E(X), we have 


E(X*)= EIX(X —)I—- pxy= GP, ti2aP 


p P Pr 
Then by Eq. (2.31), the variance of X is 


7) 3 ae aa 
of = Var X) = £002) — EP =P - =P 


vi] 


in re 


(2.104) 


(2,105) 


(2.106) 


(2.107) 


2.30. Let_X be a binomial r.v. with parameters (n, p). Verify Eqs. (2.38) and (2.39). 


By Eqs. (2.26) and (2.36), and letting g = 1 — p, we have 


E(X) = > kpx(k) = yy | te 
k=0 


n 


_ (n—)! pknlgn* 
BRR mik—pi? 4 


Letting i= k — 1 and using Eq. (2.86), we obtain 


n-| 


E(x)= "” d 


(n—-1)! 4 nti 
0 (2 — |p) le 


=>) (en cae 


n-l 


=np(p +q)"' =np(l)""| =np 


Next, 
E|X(X -1)1= 2 k(k — Dpy(k) = > k(k -o| "0! "ie 


_ k(k op ms i 
=o (n— ik! 


= (1 — 2)! k=2s-nsk 
=n(n Dp ete NK — —ayI? q 


Similarly, letting i = k — 2 and using Eq. (2.86), we obtain 


Bhi in — 9! 
E[X(X WD] —atn YBa 


as ey 
i=0 {a —Z2—aspler: 


n-2 
= nin —1)p? 5 to ; 
t=O4 
=n(n- er 2 =al(n—l)p" 
E(XY = EIX(X - 1) - E(X) = n(n —1p” + ap 


i ana 


(2.108) 


Thus, 
and by Eq. (2.31), 
os = Var(x) = n(n — l)p? + np — (np)? =np(l— p) 
2.31. Let_X be a Poisson r.v. with parameter 4. Verify Eqs. (2.50) and (2.51). 
By Eqs. (2.26) and (2.48), 


_— = os oe x 
E(X)= ¥ x= de k med (k-1)! 


=Ae* 3 A =? a y+ ) =Ae*e* =A 


i=0 


Next, 
, Ar? 
E|X(X -1= Yo k(k-De™ nee 
[X(X -1] > : » = 
a .: ee 
=A\*e* bo —=)e Moh = 42 
i=o ! 

Thus, 
F(X7)— FLX(X -lJ+ F(X —-Al +A (2.109) 


and by Eq. (2.31), 


Of = Var(X) = E(X*) -[EOP =A +A-V=A 
2.32. Let_X be a discrete uniform r.v. defined by Eq. (2.52). Verify Eqs. (2.54) and 
Ca 1G. 


py = E(X) = - (n+ 1) oF = Var(X) == (n" res 1) 


By Eqs. (2.52) and (2.26), we have 


a ] ” ] ] ] 
E(X)= YS xpy (= —-— } x= —H— ee (ath)=—int+h 
2 n 2 n2 2 


— b2. « 44 ee a 
Pe ae r y=— > x =——a(n-)l a t—l=—(a tian -— 
) 22 Py (x) ~ 2" ret in | ra [: | 
as ek \ ; \ ; 
Now, using Eq. (2.31), we obtain 
Var(X) = E(X?) —[E(OP 


| 
aie a) 
at 


1 1 > 
+—]-—(n+1)° 
n 5 ri ) 


__ t es 
= 9g ean 1) ip” 1) 


2.33. Find the mean and variance of the r.v._X of Prob. 2.22. 


From Prob. 2.22, the pdf of X is 


26 Ua x=] 
0 otherwise 


fem | 


By Eq. (2.26), the mean of X is 


1 
Xx 3 


1 
= y= [ x(2x)dx=2 
i, = E(X) i ® x(2x) dx =2 3 


=e 
3 


By Eq. (2.27), we have 


ai 

9 i) Xx 
E(X*)= | x° (2x) ak =2 — 
(X*) = ff x?(2x) a 


Ld 
2 
Thus, by Eq. (2.31), the variance of X is 


OF = Var(X) = E(X*) —[E(X)P = 1 -( 


b 


2.34. Let _X be a uniform r.v. over (a, b). Verify Eqs. (2.58) and (2.59). 


By Eqs. (2.56) and (2.26), the mean of X is 


By Eq. (2.27), we have 


. ob 
on, b 4 | wy ] ) 
w(x?) f x : rt a — "(he +ab+a*) 

i hf @ ee 3 3 
i 


Thus, by Eq. (2.31), the variance of X is 


oy =Var(X) = F(X?) -[E(X)P 


he tii beg thet ee 
3 4 12 


(2.110) 


2.35. Let X be an exponential r.v. X with parameter A. Verify Eqs. (2.62) and 


(2,63). 
By Eqs. (2.60) and (2.26), the mean of X is 


Hy = E(X) = ff , xhe dx 


Integrating by parts (u = x, dv = Ae “dx) yields 


E(X)=—xe* 


7 Ax _ 
et tae me 


Next, by Eq. (2.27), 
E(X’) = [ote dx 


Again integrating by parts (u = x, dv = Ae dx), we obtain 


~ "* ry 
+21, xe“ dx -— (2.111) 
Oo AS 


b(X7)- -x7e 


Thus, by Eq. (2.31), the variance of X is 


’ 2 fey 4 
oy = E(X’) -[E(X)P = - — 
x XX) —[E(X)] a? a rr 
2.36. Let Y= N(u; 0”). Verify Eqs. (2.76) and (2.77). 
Using Eqs. (2.71) and (2.26), we have 
1 co yp ee Dead 
uy = E(X)= xe NET ty 
. V220 be 
Writing x as (x — “) + “ we have 
ve I = ~(a-eF 4207) I 6 teu) (207) 
E(X) = che dx — ao dx 
Tino J» Tino J» : 


Letting y =x — w in the first integral, we obtain 


1 cod Quptcks % 
E(X)= = f Je yi(20*) dytuf Jy (x) dx 


The first integral is zero, since its integrand is an odd function. Thus, by the 
property of pdf Eq. (2.22), we get 


by = E(X) = ps 


Next, by Eq. (2.29), 


OF = E\(X —u)*|= i (x— pm)? e WIAA) ay 


Von 


From Eqs. (2.22) and (2.71), we have 


foe) hen 2 
f e FMICO) dy =oJ2n 


—o 


Differentiating with respect to o, we obtain 


a Ca uy ~(x—p)"/(207) / 
f —a_ e dx =J2n 
= 6G 


Multiplying both sides by g2/\/2 7, we have 


l 


V200 


te a) 9 a 212 25 9 
J Compe FO) emg" 
90 
ey " = 9 
oy = Var(X) =o" 
Thus, 


2.37. Find the mean and variance of a Rayleigh r.v. defined by Eq. (2.95) (Prob. 
2.25). 


Using Eqs. (2.95) and (2.26), we have 


20 fe eee l OC 9 we 2ya nd 
es =EX)= J, xe EO) dy = J) ge Pee Nabe 
ss oO 


2 
oO 


Now the variance of N(0; 0”) is given by 


| 00 _ 32 2 
f xe * 120°) dy =o? 
—o 


V210 


Since the integrand is an even function, we have 


| 00 _ 42 2 ] 
re x2e7* 120°) gy = = G? 
Z 


V220 °° 


or 
ta e207) ay =5 Ina? = Jee" 
Then 
uy = E(X)=— [Eo = [Eo (2.112) 
Next, 


25 2S 9 KX ~22¢q7) l © 4 iyo? 
E(X*)= J. x e* CO" d= f ee i ae 
” Oo” ao Q 


Let y = x*/(207). Then dy = x dx/o”, and so 
E(X?)—207 [" ye" dy —207 (2.113) 
eae : 
Hence, by Eq. (2.31), 


f 


of =E(X*) [EQOP =: 


wT 


5 
& 


2 
\ ay 


|e" = 0.4290" (2.114) 


ft 


2.38. Consider a continuous r.v. X with pdf fy (x). If f(x) = 0 for x < 0, then show 
that, for any a> 0, 


P(X >a)< (2; 113) 
a 
where fly = E(X). This is known as the Markov inequality. 
From Eq. (2.23), 


P(X 2a)= fo f(x) dx 


Since f(x) = 0 for x < 0, 


mn 


My =E(X)= foafe(dr= fo fyydr=a fo fy(ade 


x : a = _ Uy 
(x)dx =P =q)s —* 
J, fx eas 


Hence, 


2.39. For any a > 0, show that 


P(|X—,|2a)=s 2% (2.116) 


ar 


where /zy and OF are the mean and variance of X, respectively. This is known 
as the Chebyshev inequality. 


From Eq. (2.23), 


P(X —uy|2a)— SP feta) dx +f “ gfx) de ji. “pyjzel) dx 


By Eq. (2.29), 


4 Be Bar tps t EL ’ ds ps ~ it f 5 
ay’ = f (tty fy Qdadr = f (x— Wy) fy de = a | | Jy (aay 
mit, : a , Amity =a 


Sts Hy| = 


Hence, 


2 


| 
Oy Ty 
f _  fx(x)dx=—-~- or =P (|X = uy|= a) = —4 
|x-py|=a a a” 


Note that by setting a = koy in Eq. (2.116), we obtain 
P(|X ~uxl=40,) = 75 (2.117) 
Equation (2.117) says that the probability that a r.v. will fall & or more 


standard deviations from its mean is < 1/k*. Notice that nothing at all is said 
about the distribution function of X. The Chebyshev inequality is therefore 


quite a generalized statement. However, when applied to a particular case, it 
may be quite weak. 


Special Distributions 


2.40. A binary source generates digits 1 and 0 randomly with probabilities 0.6 and 
0.4, respectively. 


(a) What is the probability that two 1s and three Os will occur in a five-digit 
sequence? 


(b) What is the probability that at least three 1s will occur in a five-digit 
sequence? 


(a) Let X be the r.v. denoting the number of 1s generated in a five-digit 
sequence. Since there are only two possible outcomes (1 or 0), the 
probability of generating 1 is constant, and there are five digits, it is 
clear that X is a binomial r.v. with parameters (n, p) = (5, 0.6). Hence, by 
Eq. (2.36), the probability that two 1s and three Os will occur in a five- 
digit sequence is 


P(X =2) -| ; | (0.6) (0.4)° =0.23 


(b) The probability that at least three 1s will occur in a five-digit sequence 
iS 


P(X =3)=1- P(X <2) 


where 


P(X <2)= Ss ' |ooy (0.49 * =0.317 
k=0 K 


Hence, 
P(X = 3) =1-—0.317 = 0.683 


2.41. A fair coin is flipped 10 times. Find the probability of the occurrence of 5 or 
6 heads. 


2.42. 


2.43. 


Let the r.v. X denote the number of heads occurring when a fair coin is 
flipped 10 times. Then_X is a binomial r.v. with parameters (n, p) = (10, si 


Thus, by Eq. (2.36), 


6 10 ; k 7 10—k 
P(5<X<6)= | ‘ \[| | = (0.451 


Let X be a binomial r.v. with parameters (n, p), where 0 < p < 1. Show that 
as k goes from 0 to n, the pmf p(k) of X first increases monotonically and 
then decreases monotonically, reaching its largest value when < is the largest 
integer less than or equal to (n + 1)p. 


By Eq. (2.36), we have 


_Pylky MRE te ke FV (2.118) 
py(k -1) { i prom Ye, kK — p) - 


13 


Hence, py(k) = py (k — 1) if and only if (nm -—k+ 1)p=k(1—p)ork<(n+ 
1)p. Thus, we see that p(k) increases monotonically and reaches its 


maximum when k is the largest integer less than or equal to (n + 1)p and then 
decreases monotonically. 


Show that the Poisson distribution can be used as a convenient 
approximation to the binomial distribution for large n and small p. 


From Eq. (2.36), the pmf of the binomial r.v. with parameters (n, p) is 


Px‘t)=| "| pha py Ah ——e pra py" Kk 


é 


Multiplying and dividing the right-hand side by n“, we have 


2.44. 


Sle aeed aree 


tt 


If we let n — © in such a way that np = A remains constant, then 


Thus, in the case of large n and small p, 


i x : 7 
| ‘ \ (l- p)"* =e" A 


BI ap A 
KS 


‘ ? 


1— 


k 


(2.119) 


which indicates that the binomial distribution can be approximated by the 


Poisson distribution. 


A noisy transmission channel has a per-digit error probability p = 0.01. 
(a) Calculate the probability of more than one error in 10 received digits. 


(b) Repeat (a), using the Poisson approximation Eq. (2.119). 


(a) It is clear that the number of errors in 10 received digits is a binomial 
r.v. X with parameters (n, p) = (10, 0.01). Then, using Eq. (2.36), we 


obtain 


2.45. 


2.46. 


P(X > 1)=1— P(X =0)— P(X =N) 
=| -| i | (0.01)° (0,99)!° — Joe 1) (0.99) 
\ 0 f 1 ? 
=(),0042 
(b) Using Eq. (2.119) with 2 = np = 10(0.01) = 0.1, we have 


P(X >1)=1— P(X =0)— P(X =1) 


=j{—270. (0.1)" get (0.1)' 
. 0! . 1! 
=().0047 


The number of telephone calls arriving at a switchboard during any 10- 
minute period is known to be a Poisson r.v. X with A = 2. 


(a) Find the probability that more than three calls will arrive during any 10- 
minute period. 


(b) Find the probability that no calls will arrive during any 10-minute 
period. 


(a) From Eq. (2.48), the pmf of X is 
oo 
PEFR he k=0,]1,... 


Thus, 


P(X >3)=1- P(X =3)=1- BY er" 


| > 
7 i} 


(b) P(X = 0) = p(0) = e-2 = 0.135 


Consider the experiment of throwing a pair of fair dice. 
(a) Find the probability that it will take less than six tosses to throw a 7. 
(6) Find the probability that it will take more than six tosses to throw a 7. 


2.47. 


2.48. 


(a) From Prob. 1.37(a), we see that the probability of throwing a 7 on any 
toss is * Let X denote the number of tosses required for the first success 


of throwing a 7. Then, it is clear that X is a geometric r.v. with parameter 
p= x Thus, using Eq. (2.90) of Prob. 2.16, we obtain 


PALA) Pe =a) Fe) -| 


(6) Similarly, we get 


P(X > 6)=1- P(X <6) =1— Fy (6) 
5\" sg 
a eee 3| ~ 0.335 


Consider the experiment of rolling a fair die. Find the average number of 
rolls required in order to obtain a 6. 


=|- 


Let X denote the number of trials (rolls) required until the number 6 first 
appears. Then X is given by geometrical r.v. with parameter p = 7 From Eq. 


(2.104) of Prob. 2.29, the mean of X is given by 


Thus, the average number of rolls required in order to obtain a 6 is 6. 


Consider an experiment of tossing a fair coin repeatedly until the coin lands 
heads the sixth time. Find the probability that it takes exactly 10 tosses. 


The number of tosses of a fair coin it takes to get 6 heads is a negative 
binomial r.v. X with parameters p = 0.5 and k = 6. Thus, by Eq. (2.45), the 
probability that it takes exactly 10 tosses is 


"16=1' P 
rox=1)=| (4) [1-2] 

\ Gl | 2, 2 
\6 i\" ai ,10 
| =| = a1 <) =(0,123 
2} (2) si4t\2 


2.49. Assume that the length of a phone call in minutes is an exponential r.v. X 
with parameter A = > If someone arrives at a phone booth just before you 
arrive, find the probability that you will have to wait (a) less than 5 minutes, 
and (b) between 5 and 10 minutes. 


(a) From Eq. (2.60), the pdf of X is 


Then 


a = A 
PX<S)= fie *AO dy =—@ V0) = 1-6-9 = 0393 


(b) Similarly, 


P(5<X<10)= feo dx =e" °° —e7! = 0.239 


2.50. All manufactured devices and machines fail to work sooner or later. Suppose 
that the failure rate is constant and the time to failure (in hours) is an 
exponential r.v. X with parameter J. 

(a) Measurements show that the probability that the time to failure for 
computer memory chips in a given class exceeds 104 hours is e ! 
(~0.368). Calculate the value of the parameter J. 

(b) Using the value of the parameter 4 determined in part (a), calculate the 
time xp such that the probability that the time to failure is less than xq is 


0.05. 


(a) From Eq. (2.61), the cdf of X is given by 


ise" gh 
Fy (x)= 
0 x<0 


Now 


P(X > 10") =1- P(X =10*)=1- F,(10") 
=i ft ~ eo }.10" y= elo" =— 


from which we obtain J = 10. 


(b) We want 
F(X,) =P(X= Xo) = 0.05 
Hence, 1 — eu = 1] — e104 = 3.05 
or e190“ = 9.95 


from which we obtain 


x, = =10* In (0.95) = 513 hours 


2.51. A production line manufactures 1000-ohm (Q) resistors that have 10 percent 
tolerance. Let X denote the resistance of a resistor. Assuming that X is a 
normal r.v. with mean 1000 and variance 2500, find the probability that a 
resistor picked at random will be rejected. 


Let A be the event that a resistor is reyected. Then A = {X < 900} U {X> 
1100}. Since {¥ < 900} NM {X¥> 1100} = @, we have 


P(A) = P(X < 900) + P(X > 1100) = F,(900) + [1 — F,(1100)] 


Since X is a normal r.v. with w = 1000 and o* = 2500 (a = 50), by Eq. (2.74) 
and Table A (Appendix A), 


=@O(-2)=1-@) 


Fy (900) = of eee 


50 


1100 — 1000 


Fy (1100) =® 
_ee | 50 


J-eo 


Thus, 
P(A) = 2[1 — ®(2)] ~ 0.045 


2.52. The radial miss distance [in meters (m)] of the landing point of a 
parachuting sky diver from the center of the target area is known to be a 
Rayleigh r.v. X with parameter o* = 100. 


(a) Find the probability that the sky diver will land within a radius of 10 m 
from the center of the target area. 


(b) Find the radius r such that the probability that X>r is e~! (0.368). 
(a) Using Eq. (2.96) of Prob. 2.25, we obtain 

P(X = 10) = F,(10) = 1 — 27 100700 = 1 — 2°95 = 0.393 
(b) Now 


P(X >n=1—-P(X Sr) =1-F,() 
— a (1 = et 200) = e717/200 = ani 


from which we obtain r* = 200 and » = \/200 = 14.142 m:- 


Conditional Distributions 


2.53. Let X be a Poisson r.v. with parameter 1. Find the conditional pmf of X given 
B=(Xis even). 


From Eq. (2.48), the pdf of X is 


k 


p(k) =e k=0,1,... 


Then the probability of event B is 


% k 
2A 
ot 


k =even k 


P(B)= P(X =0, 2,4,...) = 


Let A = {X 1s odd}. Then the probability of event A is 


P(A) = P(X =1,3,5,...) = oo 
ne 
Now 
x _ a = _ at we = gt _ ; . . 
> ee 2 a a re (2120) 
~ BEI kf Ladd k | i | 
TEL t i f ict e + 
y oa be yo a! ye a ar 121) 
no RYEL Ke noo Aah] = k-a i! 


Hence, adding Eqs. (2.120) and (2.121), we obtain 
PER) = » € 


k=err 


eo -Lq-e) (2.122) 


Now, by Eq. (2.81), the pmf of X given B is 


P£(X =k)M BY 


k|B)= 
Px( | ) P(B) 


If kis even, (X= k) c Band (Y¥=/)N B=(X=)b). If kis odd, (X=hN B= 


@. Hence, 
P(X=k)_ 2e4aF | — 
P(B p2Ay EY 
ng(klB)= = (+e7*4)yk 
aa kodd 


2.54. Show that the conditional cdf and pdf of X given the event B = (a< X <b) 
are as follows: 


} Sar 

fy(x)— Ly lad 

F(ixlunks) = axix=h (2,123) 

(ale 0) F, (b)—Fy(a) 123) 
[ ¥>b 


fl) 
J fds 
Q 


FeixlacX= b= aixsh (2.124) 
A> Db 


Substituting B = (a < X < b) in Eq. (2.78), we have 


=< riN= 
Fy(sla<X <b)— P(X <ala< X <h)— WAS ONES AED) 
: Plax<X=b) 
Now 
Z Aa 
(X=NYNa@aiX=b)=4(axX=x) acx=b 
fas. X=b) xi>b 
Hence, 
Fy(xja<x<b)=-—O) __ = Boh | 
° Plax X=hb) 
<X= cy 
P(a<X=b) Fy (b) — Fy (a) 
<X< 
FeGla<x<bj—P es =") -4 x>b 
P(iax<X=b) 


By Eq. (2.82), the conditional pdf of X given a <_X < b is obtained by 
differentiating Eq. (2.123) with respect to x. Thus, 


0 X=a 
fy (%) ___JSx(x) use 
Pe BERS ec se ae a<x=b 
x ¢ | ) Fy (6) Fy (a) iL fy 
0 x>b 


2.55. Recall the parachuting sky diver problem (Prob. 2.48). Find the probability 
of the sky diver landing within a 10-m radius from the center of the target 


area given that the landing is within 50 m from the center of the target area. 
From Eq. (2.96) (Prob. 2.25) with o7 = 100, we have 
F(x) =|]- e7*7/200 


Setting x = 10 and 6 = 50 and a = — ~= in Eq. (2.123), we obtain 


P(X = 10| X = 50) = Fy (10| X = 50) = Eri) 
Fy (50) 
| — 7 100/200 
= ——_._—. = (1.393 


[omigs 2500 / 200 
2.56. Let X = N(0; 0”). Find E(X | X> 0) and Var(X | X> 0). 
From Eq. (2.71), the pdf of X = N(0; 0”) is 


1 —x7 (207) 
fy(x) = e* 
. V210 


Then by Eq. (2.124), 


| () xh 
} Baty Fang! ; 


eit Ao T= hae 3 ah CPE ay 
fieide vam (2.123 
‘| 
bier | = ai 
Hence, ix] A » )=2——— | ie tial 


wi cateT 


Let y = x7/(207). Then dy = x dx/o’, and we get 


2 - : (2 a . 
E(X|X>0)— 5 fre dy—o,) (2.126) 
J2a U \ar 
E(X?| X > Oj) =2 mites See fae") ote 


Jn avi 


Next, 


] 
V220 


es 7 2 96" ‘ 3 ' = 
J xe "9 dy =Var(X) =o (2.127) 
—x 


Then by Eq. (2.31), we obtain 


Var X 


X >0)— B(X?| X > 0)-[E(X| X > OF 


(2.12%) 


2.57. If X is a nonnegative integer valued r.v. and it satisfies the following 
memoryless property [see Eq. (2.44)] 


Pee a ty | Kea = Pe) Pe a | (2,129) 
then show that X must be a geometric r.v. 


Let p, = PX =k), k= 1, 2,3, ..., then 


PX>)= FY y=, (2.130) 
k=j+l 
By Eq. (1.55) and using Eqs. (2.129) and (2.130), we have 
7 P(X >i+ j)_ Spey 


P(X >it j|X > i) = ———_$ = —£=P(X> peas, 
( ‘lia aes oan le ined 


Hence, 


5, = 92,5, = 8,5, =53,... and §,,, = 5! 
Now 
8 -8a7 h-i=-Poo-p=—1=p 


Thus, 


S,= P(X >x) =S*=(1 -py' 


Finally, by Eq. (2.130), we get 


P(X =x) = P(X >x— 1) — P(X > x) 
athe pyr t--fk-— py 


=(1 —py [1-1 —p)| = — py p x=1,2,... 


Comparing with Eq. (2.40) we conclude that_X is a geometric r.v. with 
parameter p. 


2.58. If X is nonnegative continuous r.v. and it satisfies the following memoryless 
property (see Eq. (2.64)) 


PX >stil| X > y= PX> oO 020 (2.132) 
then show that XY must be an exponential r.v. 
By Eq. (1.55), Eq. (2.132) reduces to 


PAX > stt}O{X >sf)_ P(X >s +h) 


P(X >st+nxX>s)- “PER So) 
P(X > 8) P(X > s) 
Hence, 
PX>s | Oo = P(X > SPX >S> DH (2.133) 
Let 


P(X>1 = g(t) t=0 
Then Eq. (2.133) becomes 
g(s + ft) = g(s)g) s,t20 
which is satisfied only by exponential function, that is, 
g(t) =e 


Since P(X > 0) = 0 we let a=— AA > 0), then 


P(X > x) = 9X) = eo ™ x=D (2.134) 
Now if X is a continuous r.v. 
F(x) = P(X Sx) =1-P(X> x)= 1-04 x=0 


Comparing with Eq. (2.61), we conclude that_X is an exponential r.v. with 
parameter A. 


SUPPLEMENTARY PROBLEMS 


2.59. 


2.60. 


2.61. 


2.62. 


2.63. 


Consider the experiment of tossing a coin. Heads appear about once out of 
every three tosses. If this experiment is repeated, what is the probability of 
the event that heads appear exactly twice during the first five tosses? 


Consider the experiment of tossing a fair coin three times (Prob. 1.1). Let X 
be the r.v. that counts the number of heads in each sample point. Find the 
following probabilities: 


(a) P(X < 1); (b) P (X> 1); and (c) PO < X< 3). 


Consider the experiment of throwing two fair dice (Prob. 1.31). Let _X be 
the r.v. indicating the sum of the numbers that appear. 


(a) What is the range of X? 
(b) Find (i) P(X = 3); (ii) P(X < 4); and (iii) P33 <X <7). 


Let X denote the number of heads obtained in the flipping of a fair coin 
twice. 


(a) Find the pmf of X. 
(b) Compute the mean and the variance of X. 


Consider the discrete r.v. X that has the pmf 


Py) = GY" 3 1558 Bs 


2.64. 


2.65. 


2.66. 


2.67. 


2.68. 


2.69. 


2.70. 


Let A = {t: XQ = 1, 3, 5, 7, ...}. Find P(A). 


Consider the function given by 
= Gad Oe 
P(x) =4x 


QO otherwise 


where k is a constant. Find the value of k such that p(x) can be the pmf of a 
discrete r.v. X. 


It is known that the floppy disks produced by company A will be defective 
with probability 0.01. The company sells the disks in packages of 10 and 
offers a guarantee of replacement that at most 1 of the 10 disks is defective. 
Find the probability that a package purchased will have to be replaced. 


Consider an experiment of tossing a fair coin sequentially until “head” 
appears. What is the probability that the number of tossing is less than 5? 


Given that X is a Poisson r.v. and p,(0) = 0.0498, compute E(X) and P(X = 
3). 


A digital transmission system has an error probability of 10~ per digit. 
Find the probability of three or more errors in 10° digits by using the Poisson 
distribution approximation. 


Show that the pmf p(x) of a Poisson r.v. X with parameter / satifsies the 
following recursion formula: 


A 
RAP 


Pyke + l= pPelh) aa mr 


The continuous r.v._X has the pdf 


k(x—x7) O<x<1 


0 otherwise 


fo =| 


where k is a constant. Find the value of & and the cdf of X. 


2.71. 


2.72. 


2.736 


2.74. 


betas 


The continuous r.v._X has the pdf 


_|kQx-x) O0<x<2 
0 otherwise 


fx (X) 


where k is a constant. Find the value of k and P(X¥> 1). 


A tr.v. X is defined by the cdf 


Fy (x)= > 0=x<1 


(a) Find the value of k. 
(b) Find the type of X. 
(c) Find (i) P< X = 1); (ii) PE < X < 1); and (iii) P(X> 2). 


It is known that the time (in hours) between consecutive traffic accidents 
can be described by the exponential r.v. XY with parameter A = pa Find (1) P(X 


< 60); (ii) P(X > 120); and (iii) P(10 < X < 100). 


Binary data are transmitted over a noisy communication channel in a block 
of 16 binary digits. The probability that a received digit is in error as a result 
of channel noise is 0.01. Assume that the errors occurring in various digit 
positions within a block are independent. 

(a) Find the mean and the variance of the number of errors per block. 
(b) Find the probability that the number of errors per block is greater than 
or equal to 4. 


Let the continuous r.v. X denote the weight (in pounds) of a package. The 
range of weight of packages is between 45 and 60 pounds. 


(a) Determine the probability that a package weighs more than 50 pounds. 
(b) Find the mean and the variance of the weight of packages. 


2.76. 


Bed ts 


2.78. 


2.79. 


2.80. 


2.81. 


2.82. 


In the manufacturing of computer memory chips, company A produces one 
defective chip for every nine good chips. Let X be time to failure (in months) 


of chips. It is known that_X is an exponential r.v. with parameter A = = fora 


defective chip and A = a with a good chip. Find the probability that a chip 


purchased randomly will fail before (a) six months of use; and (5) one year 
of use. 


The median of a continuous r.v. X is the value of x = x9 such that P(X = xo) 
= P(X <x). The mode of X is the value of x = x,, at which the pdf of X 
achieves its maximum value. 

(a) Find the median and mode of an exponential r.v. X with parameter A. 
(b) Find the median and mode of a normal r.v. X 5 N(u, 0°). 


Let the r.v. X denote the number of defective components in a random 
sample of m components, chosen without replacement from a total of NV 
components, r of which are defective. The r.v. X is known as the 
hypergeometric r.v. with parameters (N, r, 7). 


(a) Find the pmf of X. 
(b) Find the mean and variance of X. 


A lot consisting of 100 fuses is inspected by the following procedure: Five 
fuses are selected randomly, and if all five “blow” at the specified amperage, 
the lot is accepted. Suppose that the lot contains 10 defective fuses. Find the 
probability of accepting the lot. 


Let X be the negative binomial r.v. with parameters p and k. Verify Eqs. 
(2.46) and (2.47), that is, 


Uy sexy Le = Var(x) = <U5 DD) 
Pp -_ 


Suppose the probability that a bit transmitted through a digital 
communication channel and received in error is 0.1. Assuming that the 
transmissions are independent events, find the probability that the third error 
occurs at the 10th bit. 


Atr.v. X is called a Laplace r.v. if its pdf is given by 


fo = ke7217! Xr = 0 : —cc et x < % 


where k is a constant. 

(a) Find the value of k. 

(b) Find the cdf of X. 

(c) Find the mean and the variance of X. 


2.83. Ar.v. X1s called a Cauchy tr.v. if its pdf is given by 


k 
O~Fe 


—o< 7*< 0 


where a (> 0) and & are constants. 

(a) Find the value of k. 

(b) Find the cdf of X. 

(c) Find the mean and the variance of X. 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


2.59. 0.329 


— 


2.60. (a) 5° (b) - (c) 7 
(a) Ry = {2,3,4,..., 12} 

2.61. i ie ey te eae 
(b) () ig’ (ii) 6: (iil) 5 


(0) = -, ppl) =a, p,(2) = 
ei (a) p,(Q) ge Pe) 2 Py(2) 


(b) EX) = 1, Var(X) = 
2.63. = 


2.64. k=6/r 


2.65. 


2.66. 


2.67. 


2.68. 


2.69. 


2.70. 


2.71. 


2.72. 


2.73. 


2.74. 


pa 


2.76. 


dst ds 


2.78. 


0.004 
0.9375 

E(X) = 3, P(X> 3) = 0.5767 
0.08 


Hint: Use Eq. (2.48). 


0 x=0 
k =6; F(a) = 43x" -227 O<x=1 
| > 1 
p=2- p> jj=t 
4 2 
(a) k=1. 
(b) Mixedr.v. 


fc) Gp a (ii) mt (iii) O 
(1) 0.632; (11) 0.135; (111) 0.658 


(a) E(X)= 0.16, Var(X) = 0.158 
(b) 0.165 x 10-4 


Hint: Assume that X is uniformly distributed over (45, 60). 
(a) 3 (b) E(X) = 52.5, Var(X) = 18.75 
(a) 0.501; (b) 0.729 


(a) xy = (In 2/0. = 0.693/A, x, = 0 
(b) XQ ~ Xm ~ 


Hint: To find E(X), note that 


r Al rol N = 
=. and = 
| x | i a I | | n | 2 yor, 


To find Var(X), first find ELX(X — 1)]. 
i N 7) 
BP TOS 

N 
n 
r , r | r i Me 
()) Boy=n{ = | vara =o ip * | 
ND No N}\\ N-1 


=(0,1,2,..., min{r,n} 


(a) Px()= 


2.79. Hint: Let X be ar.v. equal to the number of defective fuses in the sample 
of 5 and use the result of Prob. 2.78. 


0.584 


2.80. Hint: To find E(X), use Maclaurin’s series expansions of the negative 
binomial h(g) = (1 — q) “ and its derivatives h'(qg) and h"(q), and note that 


r= 


Dt all S| é 


cao\ ol 


To find Var(X), first find E[(X¥ — r) (¥—r — 1)] using h'"(q). 


2.81. 0.017 
| Ax 
Fe x<0 
(a) k=A/2 (b) Fy(x)= 
2.82. |i- i Me onlf 


(c) E(X)=0, Var(X) = 2/A7 


IT a 


2.83. (a) k=a/u (b) F(sy=5 + ao | 


(c) E(X) = 0, Var (X) does not exist. 


CHAPTER 3 


Multiple Random Variables 


3.1 Introduction 


In many applications it is important to study two or more r.v.’s defined on the same 
sample space. In this chapter, we first consider the case of two r.v.’s, their associated 
distribution, and some properties, such as independence of the r.v.’s. These concepts 
are then extended to the case of many r.v.’s defined on the same sample space. 


3.2 Bivariate Random Variables 


A. Definition: 


Let S be the sample space of a random experiment. Let X and Y be two r.v.’s. Then the 
pair (X, Y) is called a bivariate r.v. (or two-dimensional random vector) if each of X 
and Y associates a real number with every element of S. Thus, the bivariate r.v. (X, Y) 
can be considered as a function that to each point ¢ in S assigns a point (x, y) in the 
plane (Fig. 3-1). The range space of the bivariate r.v. (X, Y) is denoted by R,,, and 
defined by 

R,,, = (x, y); €E S and X(f) = x, Y() = y} 


xy 


Fig. 3-1 (X, Y) as a function from S to the plane. 


If the r.v.’s X and Y are each, by themselves, discrete r.v.’s, then (X, Y) is called a 
discrete bivariate r.v. Similarly, if X and Y are each, by themselves, continuous r.v.’s, 
then (X, Y) is called a continuous bivariate r.v. If one of X and Y is discrete while the 
other is continuous, then (X, Y) is called a mixed bivariate r.v. 


3.3 Joint Distribution Functions 
A. Definition: 


The joint cumulative distribution function (or joint cdf) of X and Y, denoted by F'yy(x, 
y), is the function defined by 
Fy, ¥) = PX Sx, YS y) (3,1) 


The event (¥ <x, Y<y) in Eq. (3.1) is equivalent to the event A 9 B, where A and B 
are events of S defined by 


ASCERXC Sy} md RSW EN tiny a2) 
se PALS Kein) PBL = FO 
us Bool PIA NB (4.3) 


If, for particular values of x and y, A and B were independent events of S, then by Eq. 
(1.62), 


FQ ¥) = PAN B) = PA)P(B) = F, QF) 


B. Independent Random Variables: 
Two r.v.’s X and Y will be called independent if 


Ege = FOr 


for every value of x and y. 


C. Properties of Fyy(x, y): 


The joint cdf of two r.v.’s has many properties analogous to those of the cdf of a single 
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(3.12) 


Note that the left-hand side of Eq. (3.12) is equal to P|, X< x5, yj, Y <2) (Prob. 


3.5). 


D. Marginal Distribution Functions: 
Now lim (Xsx,Y¥syH=X%sxYsn=(X% SX 


since the condition y < 0 is always satisfied. Then 


bmi M9) = FG = FL (3.13) 


Similarly, 


lim Fy G, ¥) = Fyyl®, ») = FO) (3.14) 


The cdf’s F(x) and Fv), when obtained by Eqs. (3.13) and (3.14), are referred to as 
the marginal cdf 's of X and Y, respectively. 


3.4 Discrete Random Variables—Joint Probability Mass Functions 


A. Joint Probability Mass Functions: 


Let (X, Y) be a discrete bivariate r.v., and let (X, Y) take on the values (x; y,) for a 
certain allowable set of integers i and 7. Let 


Pyy(X;. ¥) = P(X =x, Y= y) (3.15) 


The function pyy(x;, y;) is called the joint probability mass function (joint pmf) of (X, 
Y). 


B. Properties of pyy(xj, yj): 


ese bt ie) oe 1 fi 
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where the summation is over the points (x;, y;) in the range space R, corresponding to 
the event A. The joint cdf of a discrete bivariate r.v. (X, Y) is given by 


Fyy(X.¥) = S. y Pyy OG,¥;) (3.19) 


XX ypsy 
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C. Marginal Probability Mass Functions: 


Suppose that for a fixed value X = x,, the r.v. Y can take on only the possible values y, 
G=1,2,..., 7). Then 


P(X =%,) = py (4) = y Pry Ojs¥;) (3.20) 


t 


where the summation is taken over all possible pairs (x;, y;) with x; fixed. Similarly, 
PY =y¥,) = py (yj) = y Pyy (%.¥)) (3.21) 


where the summation is taken over all possible pairs (x;, y;) with y, fixed. The pmf’s 
pyx{x;) and pyy;), when obtained by Eqs. (3.20) and (3.21), are referred to as the 
marginal pmf’s of X and Y, respectively. 


D. Independent Random Variables: 
If X and Y are independent r.v.’s, then (Prob. 3.10) 


PyyQ;; ¥)) = Dy{X,)Py (¥) (3.22) 


3.5 Continuous Random Variables—Joint Probability Density Functions 
A. Joint Probability Density Functions: 
Let (X, Y) be a continuous bivariate r.v. with cdf Fyy(x, y) and let 


PR yy (X,Y) 
Tey (x: ¥) = nent 
ax dy 


“ 


The function fyy(x, y) is called the joint probability density function (joint pdf) of 
(xX, Y). By integrating Eq. (3.23), we have 


Fye(oy= fo fi Sevan ds (3.24) 


B. Properties of fyy(x, y): 
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C. Marginal Probability Density Functions: 
By Eq. (3.13), 
Hence, 
Fy (X) = Fyy (x, 0) = f-. te fy (Sm) dy d& 
fy) = ee wo =f. fy (X.) dn 
or 
FeO f we Tyy (ty) dy (3.30) 
Similarly, 
frO)= im fey Va (3.31) 


The pdf’s f(x) and f(y), when obtained by Eqs. (3.30) and (3.31), are referred to as 
the marginal pdf’s of X and Y, respectively. 


D. Independent Random Variables: 
If X and Y are independent r.v.’s, by Eq. (3.4), 
Fyy (4 y) = Fy Oly (y) 


Py (y) a a 

ee es Be Bes 
ax dy ax xt "ay v(¥) 

Then 

or 


fry QQ. y) = fy OA) (3.32) 


analogous with Eq. (3.22) for the discrete case. Thus, we say that the continuous r.v.’s 
X and Y are independent r.v.’s if and only if Eq. (3.32) is satisfied. 


3.6 Conditional Distributions 


A. Conditional Probability Mass Functions: 


If (X, Y) is a discrete bivariate r.v. with joint pmf pyy(x;, y;), then the conditional pmf 
of Y, given that _X = x,, is defined by 


Pry CX, yj ) 


Pry (V:14) = - py) >0 (3.33) 
¥|X i] py(X,) x 
Similarly, we can define py y(x\\y;) as 
Pye (Ks, Fe) ; ‘ 
yep Py (¥;)> 0 (3.34) 
. By (¥3) . 
B. Properties of py x(vjlxj): 
Lame jp Patt la =] (345) 
Fo aalile wily (3.20) 


Notice that if X and Y are independent, then by Eq. (3.22), 


Pye L4) = py) and Py lXs1¥) = Py) (3.37) 


C. Conditional Probability Density Functions: 
If (X, Y) is a continuous bivariate r.v. with joint pdf fyy(x, y), then the conditional pdf 
of Y, given that XY = x, is defined by 


Ixy (uy) (3,38) 


fy O= hoy |e 
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Similarly, we can define fy y(x|y) as 
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As in the discrete case, if X and Y are independent, then by Eq. (3.32), 
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3.7 Covariance and Correlation Coefficient 
The (x, 1)th moment of a bivariate r.v. (X, Y) is defined by 
s 5 re ¥) Pay Fey) idiscrete case) 


‘i | yer 143 
lj, = LUNEY p= (3.43) 
x Ln ‘ ; . 
J f ey" fy dt.edrdy (continuous case} 


If n = 0, we obtain the kth moment of X, and if A = 0, we obtain the nth moment of Y. 
Thus, 


= FAUX) — ity and = FY) — fly (a4) 


If (X, Y) is a discrete bivariate r.v., then using Eqs. (3.43), (3.20), and (3.21), we obtain 


Uy = E(X) = > > XiPyv (Xjs¥;) 


. ry 


" 
i Sr 2 ne - SuPy (x) (3.45a) 
Hy FY) - SS y pay Ois¥)) 
Si ¥y 
= S ur 2 Pyy (Xie J = s ¥ Py) (3.45b) 
¥; a ae 


Similarly, we have 


EX? Y— SSF pyy 8) — SF Pe) (3.46a) 
| My 

FY) =S Sy) pay lg.¥) = Syppy Oy) (3.46b) 
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If (X, Y) is a continuous bivariate r.v., then using Eqs. (3.43), (3.30), and (3.31), we 
obtain 


uy — E(X) -f- f a Xfey (vi dk dy 


~ fre | f is fyyQ.y) a dy — f i? Xfy(x) de (3.47a) 


fy = Fi¥) - f os Vy y(X.¥) ax (ty 
= ie fh I (4, ¥) a dy = . ¥fy (¥) ely (3.47b) 


Similarly, we have 


EX )= ff ver(avdrdy= fo feadde (3.484) 
BYY=f fy fro wdedy= fy hod (3.48b) 


The variances of X and Y are easily obtained by using Eq. (2.31). The (1, 1)th joint 
moment of (X, Y), 


m,, = E(XY) (3.49) 


is called the correlation of X and Y. If E(XY) = 0, then we say that _X and Y are 
orthogonal. The covariance of X and Y, denoted by Cov(X, Y) or o yy, is defined by 


Cov(X, Y) = of, = AUX — ayy (Y — a) (3.50) 
Expanding Eq. (3.50), we obtain 
Cov(X, Y) = EXY) = EQODEY) (3.51) 


If Cov(X, Y) = 0, then we say that _X and Y are uncorrelated. From Eq. (3.51), we 
see that XY and Y are uncorrelated if 
E(XY) = E(X)E(Y) (3.52) 
Note that if X¥ and Y are independent, then it can be shown that they are uncorrelated 
(Prob. 3.32), but the converse is not true in general; that is, the fact that XY and Y are 


uncorrelated does not, in general, imply that they are independent (Probs. 3.33, 3.34, 
and 3.38). The correlation coefficient, denoted by p (X, Y) or pyy, is defined by 


magi: Cov(X.¥) Oxy ay 
P(X, Y) = pyy = = (3.53) 
Ty Oy Ty Oy 


It can be shown that (Prob. 3.36) 
| Pry |=! al -l=py,= 1 (3.54) 


Note that the correlation coefficient of X and Y is a measure of linear dependence 
between_X and Y (see Prob. 4.46). 


3.8 Conditional Means and Conditional Variances 


If (X, Y) is a discrete bivariate r.v. with joint pmf py(x;, y;), then the conditional mean 
(or conditional expectation) of Y, given that X = x,, is defined by 


My» = EY |x; )= > y; Py) x (y; (x; ) (3,55) 


oF 


The conditional variance of Y, given that X = x;, is defined by 


THs, = Var(¥ ay) = Fit ¥ ty li, i FF | rate Ss (y, My, y [ly xt) [3 G 56 
ti 


which can be reduced to 


Var(¥ |x) = FY? |x) TAY [xP (3.57) 


The conditional mean of X, given that Y= y,, and the conditional variance of X, given 
that Y= y,, are given by similar expressions. Note that the conditional mean of Y, given 
that XY = x;, is a function of x; alone. Similarly, the conditional mean of X, given that Y 
= y,, 1s a function of y; alone. 

If (X, Y) is a continuous bivariate r.v. with joint pdf fyyx, y), the conditional mean 
of Y, given that X = x, is defined by 


ay, =EY |= fo yyy olay (3.58) 


The conditional variance of Y, given that X = x, is defined by 


ex) 


Oyp = Vall [= 21 ay P= PO yd Fi Olea (3.59) 
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which can be reduced to 


Var(Y |x) = E(Y? 


x) — [EY |.OP (3.60) 


The conditional mean of X, given that Y = y, and the conditional variance of X, given 
that Y = y, are given by similar expressions. Note that the conditional mean of Y, given 
that X = x, is a function of x alone. Similarly, the conditional mean of X, given that Y = 
y, is a function of y alone (Prob. 3.40). 


3.9 N-Variate Random Variables 


In previous sections, the extension from one r.v. to two r.v.’s has been made. The 
concepts can be extended easily to any number of r.v.’s defined on the same sample 
space. In this section we briefly describe some of the extensions. 


A. Definitions: 


Given an experiment, the n-tuple of r.v.’s (X1, X>, ..., X,,) is called an -variate r.v. (or 
n-dimensional random vector) 1f each X;, i= 1, 2, ..., n, associates a real number with 


every sample point ¢ € S. Thus, an n-variate r.v. is simply a rule associating an n-tuple 
of real numbers with every C € S. 
Let (X), ..., X,,) be an n-variate r.v. on S. Then its joint cdf is defined as 


Bey yy ee tT PX, St. XS) (3.61) 


Note that 

Ci ES T (3.62) 

The marginal joint cdf’s are obtained by setting the appropriate X;’s to + oo in Eq. 

(3.61). For example, 

(Nye erty a) = yo ee (3.63) 
Fyre Ma) = Fy yey Oye dae Re (3.64) 

A discrete n-variate r.v. will be described by a joint pmf defined by 

ox, Oye eX "Sa >: cudey ie (3.65) 


The probability of any n-dimensional event A is found by summing Eq. (3.65) over the 
points in the n-dimensional range space R, corresponding to the event A: 


P(X} ,..3.%, ) EA] = es 2 Py) xg (Meee My) (3.66) 
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The marginal pmf’s of one or more of the r.v.’s are obtained by summing Eq. (3.65) 
appropriately. For example, 


Py, oe re Gynt) Px, we Ky (Xppcvepte) (3.69) 


Px, ( X) “2° Dea: Bf (apres) (3.4 


Conditional pmf’s are defined similarly. For example, 


Fe 
~] 
= 


Pry a, 


Py Kgice X (Xx, Bs aksas R = 
“tf n=| i 2 
" Pry oe Xy pM My) 


(3.71) 


A continuous -variate r.v. will be described by a joint pdf defined by 
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The marginal pdf’s of one or more of the r.v.’s are obtained by integrating Eq. (3.72) 
appropriately. For example, 


Figg cay) Mee Sa) = i sed Myer, Adres My) By (3,77) 


fy, (4 )= f _ f linn z (4)..-., 4, ) dry de, (3,78) 


Conditional pdf’s are defined similarly. For example, 


1— | tal kpacn dey = 5 (3.79) 


rane os Xe 


The r.v.’s Xj, ..., X, are said to be mutually independent if 


Px, x, Mon) =] [ px, a) (3.80) 
ul 
for the discrete case, and 
Fig eX Mi seee Ma) = []% (x;) (3.81) 


hd | 


for the continuous case. 
The mean (or expectation) of X; in (X), ..., X,,) is defined as 


ed Py aug, OR ata] (discrete case} 
uj E(X 4 (3.82) 
sn eee og, (ys tly ody, Leontinuons ease) 
The variance of X; is defined as 
77 = Van(X,) = EUX, — wy] (3.83) 
The covariance of X; and X; is defined as 
a, = Cov(X,, X) = EU(X, — wx. — uw) (3.84) 
The correlation coefficient of X; and X; is defined as 
Cov(X;, X;) OO; ; 
Pi; = = — (3.85) 


OO; T;0 ; 


3.10 Special Distributions 


A. Multinomial Distribution: 


The multinomial distribution is an extension of the binomial distribution. An 
experiment is termed a multinomial trial with parameters pj, P>, ..., P;, if it has the 


following conditions: 


1. The experiment has k possible outcomes that are mutually exclusive and 
exhaustive, say A,, A>, ..., Aj. 
k 
PIA p LL posal amd ye I (486) 
om 7] 


Consider an experiment which consists of n repeated, independent, multinomial trials 
with parameters p), D>, ..., py. Let _X; be the r.v. denoting the number of trials which 


result in A;. Then (Xj, X, ..., X;) 1s called the multinomial r.v. with parameters (n, pj, 
P>, ---» P,) and its pmf is given by (Prob. 3.46) 


i! A = ATT F . 
PY, 5X (Mp Xa 5-2 XE ~ Epslwne Pr Le 1) (3.87) 
. k : 


’ 
Aj + Ag. 
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k 
forx,;=0,1,...,n,i=1,...,k, such that > x, =n 
i=l 


Note that when k = 2, the multinomial distribution reduces to the binomial 
distribution. 


B. Bivariate Normal Distribution: 


A bivariate r.v. (X, Y) is said to be a bivariate normal (or Gaussian) t.v. if its joint pdf 
is given by 


LO) 7 ante OP -+ qa. »] (3.88) 
2my,o,(1- p>)? 2 
where 
Y re f if 4 i 2 
Kote) ef oe PO ¥ fy | 
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and [l,, , 0.7, 0,7 are the means and variances of X and Y, respectively. It can be 


shown that p is the correlation coefficient of X and Y (Prob. 3.50) and that X and Y are 
independent when p = 0 (Prob. 3.49). 


C. N-variate Normal Distribution: 


Let (Xj, ..., X,,) be an n-variate r.v. defined on a sample space S. Let X be an n- 
dimensional random vector expressed as ann x | matrix: 
xX 
X=|: (3.90) 


Let x be an n-dimensional vector (n x 1 matrix) defined by 
x, 
x=]: (3.91) 
B 


The n-variate r.v. (Xj, ..., X,,) is called an n-variate normal r.v. if its joint pdf is given 
by 


a | l ‘ dpe Os % 
fyi) aN ig CAP] uy Kix p) (3.92) 
(2x) det K | 2 


where T denotes the “transpose,” p is the vector mean, K is the covariance matrix 
given by 


uy £(X,) 
p= F[X]= =|: (3.93) 
I, E(X,) 
Gi, Fin 
KH] 2 “> $3 Gj ; =Cov(X;, X;) (3.94) 
o a 


and det K is the determinant of the matrix K. Note that /x(x) stands for fy, y Qt, --+ 
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SOLVED PROBLEMS 


Bivariate Random Variables and Joint Distribution Functions 


3.1. Consider an experiment of tossing a fair coin twice. Let (X, Y) be a bivariate r.v., 
where X is the number of heads that occurs in the two tosses and Y is the number 
of tails that occurs in the two tosses. 


(a) What is the range Ry of X? 

(b) What is the range Ry of Y? 

(c) Find and sketch the range Ryy of (xX, Y). 

(d) Find P(X = 2, Y=0), P(X=0, Y= 2), and P(X¥= 1, Y= 1). 


The sample space S of the experiment is 


S = {HH, HT, TH, TT} 


(a) Ry= {0, L, 2} 
(b) Ry= %0, 1, 2} 
(c) Ryy= {(2, 0), C, 1), (0, 2)} which is sketched in Fig. 3-2. 


¥ 


12, Oj 


ae 


Fig. 3-2 


(d) Since the coin is fair, we have 


P(X = 2, Y =0) = P{HH} = 
P(X =0, ¥ = 2) = P{IT} = 7 
P(X = 1, Y = 1) = P{HT, TH} = 


=pje 


2 


3.2. Consider a bivariate r.v. (X, Y). Find the region of the xy plane corresponding to 
the events 


A=(XK+Y=23 B={X?+¥? <4} 


C = {min(X, Y) = 2} D = {max(X, Y) = 2} 


The region corresponding to event A is expressed by x + y < 2, which is shown 
in Fig. 3-3(a), that is, the region below and including the straight line x + y = 2. 
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Fig. 3-3 Regions corresponding to certain joint events. 


The region corresponding to event B is expressed by x* + y* < 27, which is 
shown in Fig. 3-3(b), that is, the region within the circle with its center at the 
origin and radius 2. 


The region corresponding to event C is shown in Fig. 3-3(c), which is found by 
noting that 


{min(X, Y) <2} = (X= 2) U(Y ¥2) 


The region corresponding to event D is shown in Fig. 3-3(d), which is found by 
noting that 


{max(X, ¥Y) < 2} =(X <2) N(¥ $2) 
3.3. Verify Eqs. (3.7), (3.8a), and (3.8b). 
Since {X < 00, Y< oo} = Sand by Eq. (1.36), 
P(X =», Y= &) = F,,(00, %) = P(S)= 1 
Next, as we know, from Eq. (2.8), 
PAH —2) = Pra =o) =0 


Since (X — -~, ¥= WC (X — —2%) and Cee ye Sa Cy =a) 
and by Eq. (1.41), we have 


P(X = —, Y= y) = F, {—®, y) =0 
P(X =x, ¥ = —©) = F(x, —) = 0 


3.4. Verify Eqs. (3.10) and (3.11). 
Clearly Se BEV HeS eS W A S y) 


The two events on the right-hand side are disjoint; hence by Eq. (1.37), 
P(X = x,, YS y) = P(X =x, YE y¥) + Px, <X=x,,Y=y) 
or 
P(x, <X=x,, Y= y) = PXK=1,, YS y)-—PX=x,, Y=y) 
=F Sie) — Fined) 
Similarly, 
(See Ss yy OSE PS Ue, SPS) 


Again the two events on the right-hand side are disjoint, hence 


P(X =x, Y Sy,) = PX =x, Y<y,) + P(X =x y, <¥Sy,) 


or 


Mee Sr ay RSE Par ae FAY FY) 


3.5. Verify Eq. (3.12). 
Clearly 


(x, <XSx,, ¥Sy,)=@, <X¥Sx,¥sy)UG@ <Xsx,, 


The two events on the right-hand side are disjoint; hence 


Pr, SX aj, Fy) Om PR SAM MY) FPG eX A 5D, HT AS 


Then using Eq. (3.10), we obtain 
Pa ee ee a Pa ae Pe A RS Pe 


SF lt Rha Ye) — [Pade Bl Ble Ral 
=F lt Fy — Plt yo ty km Pyle yy) (3.99) 
Since the probability must be nonnegative, we conclude that 


Fy l%ys Yo) — Fy) 5 Yo) — Fy. ¥) + Fy. ¥) = 0 
if x, >x, and y, > yj. 
3.6. Consider a function 


: I —¢@ Fry) Osx<a0Sy<a 
(x,y) = ‘ Y 


otherwise 


Can this function be a joint cdf of a bivariate rv. (X, Y)? 


It is clear that F(x, y) satisfies properties | to 5 of a cdf [Eqs. (3.5) to (3.9)]. But 
substituting F(x, y) in Eq. (3.12) and setting x, = yy =2 and x, =, = 1, we get 


A) Fa FLD Fee eS) UTC OUT OF ee eS 


y 4 1 9 9 
—--»' '—-2e “-e-—-le*-é i «2. () 


Thus, property 7 [Eq. (3.12)] is not satisfied. Hence, F(x, y) cannot be a joint 
cdf. 


3.7. Consider a bivariate r.v. (X, Y). Show that if X and Y are independent, then every 
event of the form (a <_X <b) is independent of every event of the form (c < Y< 


d). 
By definition (3.4), if X and Y are independent, we have 


Py, y) = FY(x)F Ay) 


Setting x, =a, x, = b, y,; =c, and y, = d in Eq. (3.95) (Prob. 3.5), we obtain 


Plas X=bhcay = d= yy (hb. d)— yy (a. d) — Fey (6.0) + Fyy (a, ) 
=F (DF (dy Fy (ay (dy Fy (DF, (e) | Fla) Fy (ec) 
= [Fy (b) — Fy (a4 (a) — 6 (e)] 
—P(a< X=£b)Pie<Y Ed) 


which indicates that event (a <_X < b) and event (c < Y< d) are independent [Eq. 
(1.62)]. 


3.8. The joint cdf of a bivariate r.v. (X, Y) is given by 


Ja-e“yd-e*) x=0,y=0,a,8>0 


Fry (x,y) = ' 
otherwise 


(a) Find the marginal cdf’s of X and Y. 
(6) Show that X and Y are independent. 
(c) Find P(X <1, Y< 1), P(X <1), P(Y> 1), and P(X > x, Y>y). 


(a) By Eqs. (3.13) and (3.14), the marginal cdf’s of X and Y are 


F,(x) = Fry (,) 1 x>0 

x)= XxX, K)= 

aha 0 x<0 
0 
0 


| 
F =F co, = 
y (Y) = Fy ©, y) \ ae 


(b) Since Fyy(x, y) = Fy(x)F yy), X and Y are independent. 
PUX=1,¥=1)=F,,(1. 1) =(1 — e901 — eF) 

(c) (X= 1) =F,(1) =(1 - ee) 
PYY>1)=1-P(¥S1)=1-F,() =e? 
By De Morgan’s law (1.21), we have 


(X>ONY > y=(XF DUNS W=(X Sy Fy) 


Then by Eq. (1.44), 


P(X > x)N(Y > y= P(X Sx) + PY =y)- P(X Hx. =y) 
= Fy(x) + Fy(y) — Fry xy) 
=(1-e“)+(1-e #")-(l-e “\l-e ©) 


ax By 


=|l-¢° 
Finally, by Eq. (1.39), we obtain 
P(X >x,Y >y)=1-P[(X>xXNY > yl=e Me & 
3.9. The joint cdf of a bivariate r.v. (X, Y) is given by 


) x<O or y<O0 

DP = OSx<a, OSy<b 
Fy (4X, y)=)P2 x2=a, OSy<b 

Pp, OSx<a, y2b 

I Ja, Yb 


(a) Find the marginal cdf’s of X and Y. 
(b) Find the conditions on p;, p>, and p3 for which X and Y are independent. 


(a) By Eq. (3.13), the marginal cdf of X is given by 


0) x<Q 
Fy (x) = Fyy (4, ) =) P3 O=x<a 


1 KPA 


By Eq. (3.14), the marginal cdf of Y is given by 


() y<0 
Fy) =Fyy @,y)=)P,2 OSy<b 
1 y=b 


(b) For X and Y to be independent, by Eq. (3.4), we must have F'yy(x, y) = 


F(x)F yy). Thus, for 0 <x <a, 0 <y <b, we must have p, = p> p3 for X and 
Y to be independent. 


Discrete Bivariate Random Variables—Joint Probability Mass Functions 


3.10. Verify Eq. (3.22). 


If X and Y are independent, then by Eq. (1.62), 


Pyy%;, ¥)) = PX= 4, Y= yr PX=x)PY = ¥) = PxlX)Py (y,) 


3.11. Two fair dice are thrown. Consider a bivariate r.v. (X, Y). Let X = 0 or 1 
according to whether the first die shows an even number or an odd number of 
dots. Similarly, let Y= 0 or 1 according to the second die. 


(a) Find the range Ryy of (X, Y). 
(b) Find the joint pmf’s of (X, Y). 


(a) The range of (X, Y) is 
Ree 10, 0), (, 1),,(1,.0) 41, D} 
(6) Itis clear that XY and Y are independent and 


iam aad al Va tia 


PY=0)=A(¥=1)= 3 = 


tal — wol— 


Thus Puli, — PIX 4, Yj PX APA -p-! jo Qt 


| 
ras 
aa! 
| 


3.12. Consider the binary communication channel shown in Fig. 3-4 (Prob. 1.52). Let 
(X, Y) be a bivariate r.v., where X is the input to the channel and Y is the output 
of the channel. Let PLY = 0) = 0.5, P(Y= 1 | X=0)=0.1, and P(Y=0|X=1)= 
US, 


P(Y =01X =0) 


P(Y = 11X = 1) 


Fig. 3-4 Binary communication channel. 


(a) Find the joint pmf’s of (X, Y). 
(6) Find the marginal pmf’s of X and Y. 
(c) Are X and Y independent? 


(a) From the results of Prob. 1.62, we found that 


P= Bf = 1 - Pe O05 
P(Y=0|X=0)=0.9 PYY=1|X=1) =0.8 


Then by Eq. (1.57), we obtain 


P(X = 0, Y= 0) = P(Y =0 | X = 0)P(X = 0) =0.9(0.5) = 0,45 
P(X =0, Y= 1) = P(Y=1 | X =0)P(X = 0) = 0.10.5) = 0.05 
P(X = 1, Y=0)= P(Y =0 | X = 1)P(X = 1) = 0.20.5) = 0.1 
PX =1,F=1)= PY = 1 |K= 1) PO = 1) —0:80.5):=—0.4 


Hence, the joint pmf’s of (X, Y) are 


Pyy(0, 0) = 0.45 Pyy(0, 1) = 0.05 
Pxy(1, 0) =0.1 Py, 1) =0.4 


(b) By Eq. (3.20), the marginal pmf’s of X are 


px0)= > pxy 0, y;) = 0.45 +0.05 = 0.5 


yj 


Px(1)= ¥ pyy (yj) = 0.1404 =0.5 


yj 


By Eq. (3.21), the marginal pmf’s of Y are 


Py) = ¥ pxy (%;,0) = 0.45 +0.1=0.55 


py() = SY Pxv 1) =0.05+0.4=045 


(c) Now 
P,(0)p (0) = 0.5(0.55) = 0.275 ¥ p, (0, 0) = 0.45 


Hence, X and Y are not independent. 


3.13. Consider an experiment of drawing randomly three balls from an urn containing 
two red, three white, and four blue balls. Let CX, Y) be a bivariate r.v. where XY 
and Y denote, respectively, the number of red and white balls chosen. 


(a) Find the range of (X, Y). 

(b) Find the joint pmf’s of (X, Y). 

(c) Find the marginal pmf’s of X and Y. 
(d) Are X and Y independent? 


(a) The range of (X, Y) is given by 
Ryy = {(0, 0), (0. 1), (0, 2), ©, 3), (1, 9), (1, 1). C1, 2), (2, 0), (2, 19} 
(b) The joint pmf’s of (X, Y) 
Py. f= PX=4,Y=f) i=0,1,2 f=O.As2 03 


are given as follows: 


gyre A a, 
par(0.0)=| 4 | 3 er Py (0.1) = | 

Pyy (0. 2) -| : | , | /| : | ro Pry (0, 3) — Bl | 3 
matr-(2(Y 2X2 racre[ 
Pyy Gl, 2) = | ; | : \/| : | = = 


which are expressed in tabular form as in Table 3-1. 


TABLE 3-1 pyyii,/) 


9 | Ig 

i =... 

I | 3) 84 
4 
ad 


(c) The marginal pmf’s of X are obtained from Table 3-1 by computing the row 
sums, and the marginal pmf’s of Y are obtained by computing the column 


sums. Thus, 
«o S35 42 7 
0) =— h=— oe , 
Px 84 Px (1) 34 Px( 84 
20 45 Is 
’ 0) =— (1 == (2) =— 
uae ele seed 

(d) Since 


5 


4 as 
y (0,0) =—- # py(0) py (0) = —— 
Py (0,0) 34 Pxy(O)Py (0) =( 


Py (3) — 


€|- 


X and Y are not independent. 


3.14. The joint pmf of a bivariate r.v. (X, Y) is given by 


k(2x, + y,) x, =1,2;y,=1,2 
Pxy (X49) = 


0 otherwise 


where k is a constant. 

(a) Find the value of k. 

(6) Find the marginal pmf’s of X and Y. 
(c) Are X and Y independent? 


(a) By Eq. (3.17), 


SSewGirx)= > > k2x; +y;) 
AG Yj x; =1 yy=l 
=AU(2 +1) +(24+2)4+ (4+) + (4 +2) =k18)=1 
Thus, k ==. 
(b) By Eq. (3.20), the marginal pmf’s of X are 


; 2 l 
Px (Xj) = > Pry (XN I= ». eet +¥;) 


yj ¥j=l 


=x +1n++Lex,+2)=bax, +3) 
18 18 18 


x; =1,2 
By Eq. (3.21), the marginal pmf’s of Y are 
— fae 1 =_ : 1 > 
Py (yj) = QP OS ame >> rhe yy) 
= Gy yo lAby jay +6) aj =1,2 
is 18 a ges 3 


(c) Now py(x;)py 0) F Pxy%; ¥;); hence X and Y are not independent. 


3.15. The joint pmf of a bivariate r.v. (X, Y) is given by 


t baie / 


key, x, = 1,23, =1,2,3 
0 otherwise 


where k is a constant. 

(a) Find the value of k. 

(6) Find the marginal pmf’s of X and Y. 
(c) Are X and Y independent? 


(a) By Eq. (3.17), 


LdPa y; = 3 See y; 


x =1y,=1 


=kI+2+3+44+8+12)=k(30)=1 


ti 
Thus, k a 
(b) By Eq. (3.20), the marginal pmf’s of X are 
a 


Px (Xj =3 pyri.) )= 30 —x;¥; sa x, =1,2 


xj ay to 


By Eq. (3.21), the marginal pmf’s of Y are 
- too 1 
Py) = Pw@.¥)= Y —x7¥) = 2, y, =1,2,3 
XY x=l * 


(c) Now 
i oe 
Px(%)Py (7) ~ 39! Yj = Pxy (%; Yj) 


Hence, X and Y are independent. 


3.16. Consider an experiment of fossiné two coins three times. Coin A is fair, but coin 
B is not fair, with P(H) = — fand P(T) == >. Consider a bivariate r.v. (X, Y), 
where X denotes the sates of heads sean from coin A and Y denotes the 
number of heads resulting from coin B. 

(a) Find the range of (X, Y). 


(b) Find the joint pmf’s of (X, Y). 
(c) Find P(XY= Y), P(X> Y), and P(X + Y<4). 


(a) The range of (X, Y) is given by 
Re =o f= 1,255) 


(6) It is clear that the r.v.’s X and Y are independent, and they are both binomial 
r.v.’s with parameters (n, p) = (3, *) and (n, p) = (3, 7» respectively. Thus, by 


Eq. (2.36), we have 


. : 3 4 ; 3 EI we l 4 | 
HO) =P =O] > = mi =FiK=1)=| fy ae 
x of} oY Home 
ra Pey faifay 1 
=—PfyYot pata = Wo hy y= ’ West a 
@=Pe=D=|5 |] 5 py(D=PX=9) "ee : 
py (= POY -0-(* Ble == ny (l)= POF “lel ae 
airy fal 4 sWaViay 
(2) PO 2} | | | | ; ny PRO | 
tt Bled iE, e pl fa 


Since X and Y are independent, the joint pmf’s of (X, Y) are 
Pye) = PDP A) 4G = 0612.2 


which are tabulated in Table 3-2. 


Table 3-2 pyyti, /) 


l 3 
0 7 
I ia 

3 
2 512 
e aie 


(c) From Table 3-2, we have 


ee CR ee eee FY 
PUX=V= Y parli.= 25021 pie 
-U 


306 


“(81181127 81 2719) 
512 512 


PUX+Y>4)= DS Pryy (te JY= Py (2, 3) Poy (3, 29+ Pee (3, Sp 
1 gad 


3 

- B49+)=— 

512 $12 
3) 6499 
Thus. PX + y=4dy =1- P(X -¥ & 4)=1-— = — 


312 512 


Continuous Bivariate Random Variables—Probability Density Functions 


3.17. The joint pdf of a bivariate r.v. (X, Y) is given by 


kieryy Ore Z,0< yx 2 
Ixv(Xy) = 


0 otherwise 


where k 1s a constant. 

(a) Find the value of k. 

(6) Find the marginal pdf’s of X and Y. 
(c) Are X and Y independent? 


(a) By Eq. (3.26), 


if. Sxy (4, y) dx dy =k f “(a ee 


=k {5 +] 
iy °™ 


=k f-(2+2y)dy=8k=1 


x=2 


dy 


x=0 


| 
Thus, k= ry 


(b) By Eq. (3.30), the marginal pdf of X is 


; 00 1 f2 
f(x) = J", Sar ody =< f(a + yay 


y=2 


‘ a\i 
8 3 


y=0 0 


_ Tet) O<x<2 


otherwise 


Since fyy(x, y) 1s symmetric with respect to x and y, the marginal pdf of Y is 


1 
Aydt) Of y<2 
fy) =14 . 


0 otherwise 
(c) Since fyylx, vy) FfAx)fy), X and Y are not independent. 
3.18. The joint pdf of a bivariate r.v. (X, Y) is given by 
ky 0O<x=1L0<y<1 
Fay Gs Y)= fe otherwise 


where k is a constant. 

(a) Find the value of k. 

(6) Are X and Y independent? 
(c) Find P(X¥+ Y< 1). 


(a) The range space Ryy is shown in Fig. 3-5(a). By Eq. (3.26), 


y 


¥ 


(a) 


Fig. 3-5 


9 | ; 
| fo far y)dx dy=k i ie xy dx dy=k ila ie | dy 
0, 


Thus k = 4. 


(6) To determine whether X and Y are independent, we must find the marginal 
pdf’s of X and Y. By Eq. (3.30), 


1 
4xydy=2x 0<x<Il 

ty (x)= J, 
0 otherwise 


By symmetry, 


zy O=< y= 
fro) = 0 


otherwise 
Since fyy(x, v) =fr(x)fyy), X and Y are independent. 


(c) The region in the xy plane corresponding to the event (X + Y< 1) is shown 
in Fig. 3-5(5) as a shaded area. Then 


y |l-y 


sy 
P(X +Y <1)= Sf. 49 dx dy = fale Ja 
Vy 


oe ee ee | 
fray} zc y) Jas : 
3.19. The joint pdf of a bivariate r.v. CX, Y) is given by 


Sxy (iY) = ) 


otherwise 


where k is a constant. 
(a) Find the value of k. 
(b) Are X and Y independent? 


(a) The range space Ryy is shown in Fig. 3-6. By Eq. (3.26), 


Fig. 3-6 


| dy 
0 / 


al fo fay) dx dy =k iis fox dx dy =k Sof 


| y . k 


Thus k= 8. 
(b) By Eq. (3.30), the marginal pdf of X is 


| ; 3, 
8xydy=4x1—-—x°) O<x<l 


) otherwise 
By Eq. (3.31), the marginal pdf of Y is 


RG= Jp 8xy dx = 4y" Q<y<l 
: 0) otherwise 


Since fyy(x.v) #f/Ax) fv), X and Y are not independent. 


Note that if the range space Ryy depends functionally on x or y, then X 
and Y cannot be independent r.v.’s. 


3.20. The joint pdf of a bivariate r.v. (X, Y) is given by 


k Q<y=arxr<l 
Puy 0 9) = ( otherwise 
where k is a constant. 
(a) Determine the value of k. 
(6) Find the marginal pdf’s of X and Y. 
(c) Find P(O< X < 5,0 i 5) 


(a) The range space Ryy is shown in Fig. 3-7. By Eq. (3.26), 


x 


Fig. 3-7 
fe) ~ ? | 
SS fav pdx dy =k H ff dx dy =k X area(Ryy ) = (4 =1 
XY 


Thus k = 2. 
(b) By Eq. (3.30), the marginal pdf of X is 


= [,24y=2x 0<x<I1 
0 


otherwise 


By Eq. (3.31), the marginal pdf of Y is 


ay | fi2ar=20-y) O<y <1 
Y y= J 
0 


| otherwise 


(c) The region in the xy plane corresponding to the event (0 < X < 7 0<Y< 5) 
is shown in Fig. 3-7 as the shaded area R,. Then 


P o<x<t.0<v<t|-plo<x<t,o<v<x] 


~ = 


= Sf Far (x,y) dx dy =2 ffe dy =2 Xarea(R, )= 2(5) -2 


R, R, 
Note that the bivariate r.v. (X, Y) is said to be uniformly distributed over the 
region Ryy if its pdf is 


[k (x.y) E Ryy 


‘ : (3.96 
0 otherwise ; 


Tey (x, ¥) = 


where k is a constant. Then by Eq. (3.26), the constant k must be & = 1/(area 


3.21. Suppose we select one point at random from within the circle with radius R. If 
we let the center of the circle denote the origin and define X and Y to be the 
coordinates of the point chosen (Fig. 3-8), then (X, Y) is a uniform bivariate r.v. 
with joint pdf given by 


(Ni\ 
LLY 


Fig. 3-8 


k ety <P 
Sy X,Y) = 


0 x+y? >R? 


where k is a constant. 

(a) Determine the value of k. 

(6) Find the marginal pdf’s of X and Y. 

(c) Find the probability that the distance from the origin of the point selected is 
not greater than a. 


(a) By Eq. (3.26), 


Ff fa Guyvydedy=k ff dedy=k(aR?)=1 


3 4 > 
vty sR 


Thus, k = 1/ 2R?. 
(b) By Eq. (3.30), the marginal pdf of X is 


eet — 75 yp? 2 2 25? 
fyQ) = oa ike = dy=—SyRo—x xR 
<_IR?—-x? [ler 
Hence, fey = 4 0R? S hl 
0) jx[>R 


By symmetry, the marginal pdf of Y is 


2 
fy (y) = 4 aR? a 
ish) = 

0 


(c) For0<a<R, 


P(X? + Y? <a)= ff fey (2. ¥) dx dy 


* +y" sa’ 
2 2 
l Ia a 
=—; ff dx dy = —— = — 
wR 3.2.2 wR R- 
x+y Sa 


3.22. The joint pdf of a bivariate r.v. (X, Y) is given by 


4 (axtby) x> 0, y >0 


0 otherwise 


Ixy (x, y) - 


where a and b are positive constants and k is a constant. 
(a) Determine the value of k. 
(6) Are X and Y independent? 


(a) By Eq. (3.26), 
f 7 f mol xy (xX, y) dvdy =k AS ‘ f ‘ e Fay dy 


0. sos OC ups r 
=k f of “dx [i e dy = — 


Fe a 
ab 


Thus, k= ab. 
(b) By Eq. (3.30), the marginal pdf of X is 


fxlx)=abe® fe dy=ae® — x>0 
By Eq. (3.31), the marginal pdf of Y is 
Jy (y) = abe oy ii * e ™ dx=be y>0 


Since fyyx, v) =fx)fyy), X and Y are independent. 


3.23. A manufacturer has been using two different manufacturing processes to make 
computer memory chips. Let (X, Y) be a bivariate r.v., where X denotes the time 
to failure of chips made by process A and Y denotes the time to failure of chips 
made by process B. Assuming that the joint pdf of (X, Y) is 


abe x>0,y>0 
Fxy (X,Y) = 


0 otherwise 
where a = 10-4 and b = 1.2(10), determine P(X> Y). 


The region in the xy plane corresponding to the event (X > Y) is shown in Fig. 3- 
9 as the shaded area. Then 


0 


Fig, 3-9 
PY SHS abf, I " e YI by ax 
= ab f eo IS nas| dx=a i * e (1-7) dx 


b — 1.2(10-*) 


3.24. A smooth-surface table is ruled with equidistant parallel lines a distance D apart. 
A needle of length L, where L < D, is randomly dropped onto this table. What is 
the probability that the needle will intersect one of the lines? (This is known as 
Buffon s needle problem.) 


We can determine the needle’s position by specifying a bivariate r.v. (X, ©), 
where X is the distance from the middle point of the needle to the nearest parallel 
line and @ is the angle from the vertical to the needle (Fig. 3-10). We interpret 
the statement “the needle is randomly dropped" to mean that both X and © have 
uniform distributions and that _X and © are independent. The possible values of X 
are between 0 and D/2, and the possible values of © are between 0 and 7/2. 
Thus, the joint pdf of (X, ©) is 


Fig. 3-10 Buffon’s needle problem. 


: oxx=” oxee® 
Sy (%. 9) = fy (2) f(A) =) 2D 9 ) 
0) otherwise 


From Fig. 3-10, we see that the condition for the needle to intersect a line is XY < 
L/2 cos 9. Thus, the probability that the needle will intersect a line is 


he wi2 p(Li2)cos@ 
. . (L/2) = 
I [x<4 cos q it i: fre (x, 0) dx dé 


a = rik ima” | do 
a Ot ha Feud dQ = Be 
mp 2 aD 
Conditional Distributions 
3.25. Verify Eqs. (3.36) and (3.41). 
(a) By Eggs. (3.33) and (3.20), 
> Pyx (%}.Y)) | 
2 Prxibs)= . Px (%) 7 oa _ 


(6) Similarly, by Eqs. (3.38) and (3.30), 


JF DD pecs) 


” fy Ole) = AL] 
- "F | fy@) Fy(*) 


3.26. Consider the bivariate r.v. (X, Y) of Prob. 3.14. 
(a) Find the conditional pmf’s py, y(v; | x;) and py, y%; | 3). 


(b) Find P(Y = 2. X= 2) and P(X = 2 Y= 2). 
(a) From the results of Prob. 3.14, we have 


to +:y;) x 
Pry (%,%;)= 418 


0 otherwise 


: = 12:97 = 1,2 


Py(%;)= 2 tae ae = 1,2 
18 
| x 
ay (¥;) = —(2y; + 6) y,=1,2 
Py (yj) = : j 


Thus, by Eqs. (3.33) and (3.34), 


2a ty; 7 ae 
Py x)= 46 7% y, =1,2;4,=1,2 
‘All <x _ 24; Ty} ot Bean colt ® 
Pyy (9) = 2y; ae Xx; =1,2; y; =1,2 


(b) Using the results of part (a), we obtain 


2(2)+2 6 

PY =21X =2)= p,,(22)= 2 = * 
© = 21K =2)= Prix QP) 4(2)+3 11 
2(2)+2 3 

( | )= Px 6 |2) 202)46 5 


3.27. Find the conditional pmf’s py) y (y; | x;) and py, p(xjly ,) for the bivariate rv. (X, 
Y) of Prob. 3.15. 


From the results of Prob. 3.15, we have 


. [42y, x; =1,2; y;, =1,2,3 
Pxy (%;¥;) =4 30 
| 0 otherwise 
1] 35 
Px) = SH x; =1,2 
1 
PrO)= 9; ¥;=1,2,3 


Thus, by Eqs. (3.33) and (3.34), 


1 9 


307 4 I 

Prix iP == i ee a 

— 

5 Pi 

1 2 

‘ ant ed I 2 

Pyy(4j|¥) = a0 = ra x; = 1,25 = 12.3 

y; 

6™ 


Note that Py, (y; | x;) = py) and py, yalv;) = py), as must be the case since 
X and Y are independent, as shown in Prob. 3.15. 


3.28. Consider the bivariate r.v. (X, Y) of Prob. 3.17. 
(a) Find the conditional pdf’s fy | y(v |x) and fy, y(x | y). 
(b) Find P(O < Y<3 |X = 1). 


(a) From the results of Prob. 3.17, we have 


fe®)= 
lo otherwise 


fel =L+1) O<x<2 
f= = +1) O<y<2 
Thus, by Eqs. (3.38) and (3.39), 


(x+y) 


1 hy 
f(x) = $——_= —= ) 0<*<2,0<y<2 
Sy 2a 
4 
| 
(x+y) 
eae | ebay 
fy Gly oS 2 i272 Oxy <2 


2+4+1 
ee >+]) ’ 
7° 


(6) Using the results of part (a), we obtain 


5 
dy = 
o #2 


2 ] ey ae Pew: =: wf ity 
PO<Y<-|X=)=f, ‘how GPa. | ; 


3.29. Find the conditional pdf’s fy) y(y |x) and fy) yx | y) for the bivariate rv. (X, Y) of 
Prob. 3.18. 


From the results of Prob. 3.18, we have 


' faxy  O0<x<1,0<y<1 
Ixy) = Jo 


otherwise 
Fy(x) = 2x O< wel 
fyQ) =2y a ec | 
Thus, by Eqs. (3.38) and (3.39), 
J). Axy 
fx) => =2y 0<jy<L0<z<1 
LX 
, 4xy 
Fy ly) =3~ = 2x 0<x<10<y<l 
y 


Again note that fy (y |x) =fyy) and fy y @ |v) =f), as must be the case since 
X and Y are independent, as shown in Prob. 3.18. 


3.30. Find the conditional pdf’s fy, \(y |x) and fy) yx | y) for the bivariate rv. (X, Y) of 
Prob. 3.20. 


From the results of Prob. 3.20, we have 


2 CS fSrs 1 
fxy (%¥) = 0 


fy (x) =2x 0 z< 1 
fr(y)=2d-y)  O<y<1 


otherwise 


Thus, by Eqs. (3.38) and (3.39), 


_ i 
faux =— ysx<10<x<1 
Be 


oe — yeor<10<2<1 


I 
ay UY 
Fry ( “9 


3.31. The joint pdf of a bivariate r.v. (X, Y) is given by 


£ ee Y x>O0,y>0 
Fy Oy) =7Y . . 
0 otherwise 


(a) Show that fy(x, y) satisfies Eq. (3.26). 
(b) Find P(X>1| Y=y). 


(a) We have 
- aan’ % px ] —x/y —¥ 
fof fav @ y) de dy = iL t, - Ya > ade dy 


x | . ca ji, 
=| -e (f° e Max} dy 


lie 
xo 
dy 
x=0 . 


i . 7 
=] ae —ye*? 
0 y . 


=f- e*dy=1 
0 
(6) First we must find the marginal pdf on Y. By Eq. (3.31), 
: * ] -y [*? -x/y 
fr =f_. fry) os a |. e "ax 


fy es _ —y 
x=0 


By Eq. (3.39), the conditional pdf of X is 


| cspy 
faylay) |=? gO, FO 


ty (y) 


Pv (X= 


0 otherwise 


Then 


P(X >1|¥ =y)= f Fyy Gy) dx = f noe dx 


3 te Fae 


y=0 
ion —Why 
x=] 


Covariance and Correlation Coefficients 


3.32. Let (X, Y) be a bivariate r.v. If X and Y are independent, show that X and Y are 
uncorrelated. 


If (X, Y) is a discrete bivariate r.v., then by Eqs. (3.43) and (3.22), 


E(XY¥)= > S “Yj Pxy (Xj+ Vj) = > S 4Y; Px (x; )Py (Yj) 


yy yy Xi 
= Ped [2 J = E(X)E(Y) 
Xx; yy 
If (X, Y) is a continuous bivariate r.v., then by Eqs. (3.43) and (3.32), 
E(XY)= ff ay Say ey de dy= ff" oy Se fy de dy 
=f tx def yy dy = EOOEY) 
Thus, X and Y are uncorrelated by Eq. (3.52). 


3.33. Suppose the joint pmf of a bivariate r.v. (X, Y) is given by 


_}} (0,1), (1,0),(2, 1) 


Pxy (X}5¥;) 
lo otherwise 


(a) Are X and Y independent? 
(b) Are X and Y uncorrelated ? 


(a) By Eq. (3.20), the marginal pmf’s of X are 


| 
Px(0)= Dd Pxr Oh9i) = Pas Ds 


Yj 

l 

Px) = Pry (1.¥;)= Pry, 0) = 
yy 


. 1 
px(2)= ¥ xy 2, ¥)) = Pxy(2,) = 3 


yj 


By Eq. (3.21), the marginal pmf’s of Y are 


Py (0) = > Pyy (4;,0) = pyy 1,9) = F 


; 2 
Py (I= D,Par Bi I) = Pyy 0,1) + pyy (2, = 3 


a y 
Pxy (0,1) ~ 3 # py (O)py (1) = 


Oo |b 


and 
Thus, X and Y are not independent. 
(b) By Eqs. (3.45a), (3.455), and (3.43), we have 


ye on fT 
X= Sats= (4 + (2) 


x; 


+)-1 
3 


Ries 
+(1)/— 
oft 


. al I 3 
EY)= Sy Py (op) =(0) | +1) (2 


yj 


E(XY)= S Say; Pxy (3+ Y;) 


be 


gyn 2. \erryo)| 2-| eran! © 122 
on| 3 |+oo 3 Jan | 3 


Now by Eq. (3.51), 


me) 


Ww 


Cov(X, Y) = E(XY) — E(X)E(Y) = = - o(2) =0 


Thus, X and Y are uncorrelated. 


3.34. Let (X, Y) be a bivariate r.v. with the joint pdf 


Show that X and Y are not independent but are uncorrelated. 


3.35. 


By Eq. (3.30), the marginal pdf of X is 
fy(y= —f Pes yy") ety dy 


, 
_y“/? 
VP dy 


_\7)9 f 
mike xf. : oP ay + f* : ye 
2V2a Of Do . “? ade ~ 


Noting that the integrand of the first integral in the above expression is the pdf of 
N(O; 1) and the second integral in the above expression is the variance of N(0; 
1), we have 


: 1 nak . =—x2/9 
(x)= x +he""" OO. HS 
LKO"7 Be 


Since fyy(x, y) 1s symmetric in x and y, we have 


1 7 ~ vey 
Q?+De"? 0 <y<00 
2V20 


fyQ)= 


BI 


Now fyylx, v) FA (x) fy), and hence X and Y are not independent. Next, by Eqs. 
(3.47a) and (3.475), 


E(X)= [xf (x) dx =0 
E(Y)= f- fy() dy =0 
since for each integral the integrand is an odd function. By Eq. (3.43), 
E(XY)= ff fxr, ») dx dy =0 


The integral vanishes because the contributions of the second and the fourth 
quadrants cancel those of the first and the third. Thus, E(XYY) = E(X)E(Y), and so 
X and Y are uncorrelated. 


Let (X, Y) be a bivariate r.v. Show that 
[EQXY))? = EXOE(Y?) (3.97) 


This is known as the Cauchy-Schwarz inequality. 


Consider the expression E[(X — aY)*] for any two r.v.’s X and Y and a real 
variable a. This expression, when viewed as a quadratic in a, is greater than or 
equal to zero; that is, 

E[(X — aY)?]=0 
for any value of a. Expanding this, we obtain 


E(X?) — 2aE(XY) + a2E(Y?) =0 


Choose a value of o for which the left-hand side of this inequality is minimum, 


E(XY) 
a => 
E(Y’) 
which results in the inequality 
£(x?)- EQ 5 9 or [E(XY)P = E(X*) EY’) 


E(Y*) 
3.36. Verify Eq. (3.54). 
From the Cauchy-Schwarz inequality [Eq. (3.97)], we have 


{EX ay MY ey WP SEX wy PEW ay I 


or Wy = OY; 
> we. ; 
Th ee | ae 
Then Pry > =! 
o Oy 


Since p yy is a real number, this implies 


(Pee [= 1 or =! 


lA 
> 

IA 

i 


3.37. Let (X, Y) be the bivariate rv. of Prob. 3.12. 
(a) Find the mean and the variance of X. 
(b) Find the mean and the variance of Y. 
(c) Find the covariance of X and Y. 
(d) Find the correlation coefficient of X and Y. 


(a) From the results of Prob. 3.12, the mean and the variance of X are evaluated 
as follows: 


E(X) = Sx, py(%;) = (00.5) + (0.5) = 0.5 


Xj 


E(X?) = S.x?py(xj) = 5) + (10.5) =0.5 


xj 


Oy = F(X") -[EQOP =0.5 — (05)? = 0.25 
(6) Similarly, the mean and the variance of Y are 


EY)= y,py(y;) = ON0.55) + (10.45) = 0.45 


Yj 


E(Y*)= Sy? py (yj) = OY (0.55) + ()"(0.45) = 0.45 


xj 


oy = EY?) [EYP =0.45 — (0.45)? = 0.2475 
(c) By Eq. (3.43), 


E(XY)= > ¥ 3,9; Pxy@i¥;) 


= (0)(0)(0.45) + (0)CI)(O.05) + )(0)(0.1) + (CI)(0.4) 
=04 


By Eq. (3.51), the covariance of X and Y is 
Cov(X, Y) = E(XY) — E(X) E(Y ) = 0.4 — (0.5)(0.45) = 0.175 
(d) By Eq. (3.53), the correlation coefficient of X and Y is 


‘ov 175 
_ Cov(X,¥)_ 0.17 -~ 


3.38. Suppose that a bivariate r.v. (X, Y) is uniformly distributed over a unit circle 
(Prob. 3.21). 


(a) Are X and Y independent? 
(b) Are X and Y correlated? 


(a) Setting R = 1 in the results of Prob. 3.21, we obtain 


J vty <i 
fyy(X%, y)= 24 . 
0 x+y>1 


fea==Ji-F — |x|<1 
2 5 
i eae ae ly|<1 


Since fyy(x, v) Fx) fy (v), X and Y are not independent. 
(b) By Eqs. (3.47a) and (3.475), the means of X and Y are 


_2 f! oe oe 
E(X)=— fx 1—x? dx=0 


_2f1 ay oe 
EW)=—J_jyyl-y" dy=0 


since each integrand is an odd function. 
Next, by Eq. (3.43), 


E(xY)=+ ff fxy dx dy=0 


TU 
x7 +y" <l 


The integral vanishes because the contributions of the second and the fourth 
quadrants cancel those of the first and the third. Hence, E(XY) = E(X)E(Y) = 
0 and X and ¥ are uncorrelated. 


Conditional Means and Conditional Variances 


3.39. Consider the bivariate r.v. (X, Y) of Prob. 3.14 (or Prob. 3.26). Compute the 
conditional mean and the conditional variance of Y given x; = 2. 


From Prob. 3.26, the conditional pmf py, x(y; | x;) is 


| 2%, +; is es 
Bees Urn ie ¥; Shas oF 1 
Py Xj Ax, fs 3 nie t 
gut TH 
‘Thus. Pyy(¥;|2)= i = vj 


and by Eqs. (3.55) and (3.56), the conditional mean and the conditional variance 
of Y given x; = 2 are 


aty, 
Hyp = EW |x, =2)= Yi yPyx0|2)= 3s 7 | 
yj 4 


5 6) 17 
= n(5]+ 9 (6) ~ 17 1.545 
On “ty ti 


baie gh] 


-(=$ (5]+(4} {5} - 390 0248 
Tp Mly My Moy = 1334 


3.40. Let (X, Y) be the bivariate r.v. of Prob. 3.20 (or Prob. 3.30). Compute the 
conditional means E(Y |x) and E(X |y). 


9 
Oy/9 =F 


From Prob. 3.30, 
a 1 
fyx x) =e= yHr< 1,0 2<1 
% 


1 
ay (xly) =— yerxr<1o<x<l1 
fay @ly) = 


By Eq. (3.58), the conditional mean of Y, given_X = x, is 


= _ exp 1), _¥ —_ —_ 
EY |= fo wvhyxolndr=f, (4) dai r == O<x<1 
Similarly, the conditional mean of X, given Y = jy, is 
ra 1 ‘ ’ =I | 
apes 8 in xg | x ly 
E(X} y)— Myy (X| yp addy — | |x - = Osy¥al 
er a hal ieee a Wr aT my 7 | 


Note that E(Y |x) is a function of x only and E(X |y) is a function of y only. 


3.41. Let (X, Y) be the bivariate r.v. of Prob. 3.20 (or Prob. 3.30). Compute the 
conditional variances Var(Y |x) and Var(X ly). 


Using the results of Prob. 3.40 and Eq. (3.59), the conditional variance of Y, 
given X = x, 1s 


wo 
—s 


Var(¥ |x) = FLY — E(Y|P x} = f [»-=] fxs) dy 


Similarly, the conditional variance of X, given Y = y, is 


ee 
20 


( >? 
Var(X|=ELX £X| ye yp= ffx | fay x] pvc 
\ 


7 
f 


; ry, 5 ata 7 

if Lay I l+y¥ | (lay) 
=f | a dy = Le : | = + 
sa 2 VER 3(1—¥)\ | 12 


N-Dimensional Random Vectors 


3.42. Let (X), X>, X3, X4) be a four-dimensional random vector, where X; (k= 1, 2, 3, 
4) are independent Poisson r.v.’s with parameter 2. 
(a) Find P(X, = 1, Xp = 3, X3 = 2, X4 = 1). 
(6) Find the probability that exactly one of the X;’s equals zero. 


(a) By Eq. (2.48), the pmf of X; is 


f 


Py, @) = PCX, ame te i=0,1.... (3.98) 
i! 
Since the X;’s are independent, by Eq. (3.80), 


P(X, -1,Xy —3,X) —2, Xy —D— py (Dry, Bry, Dry, ) 


Se Sema ee rs Oy 1 ane Ley HP Re Re —¥ 
2\fe “2° |e “2 Ie 21 a 2 
7 ’ 


}2 


= 35810“) 


4 


(b) First, we find the probability that X, = 0, k= 1, 2, 3, 4. From Eq. (3.98), 


P(X, = 0) =e” k=1,2,3,4 
Next, we treat zero as “success.” If Y denotes the number of successes, 


then Y is a binomial r.v. with parameters (n, p) = (4, e-’). Thus, the 
probability that exactly one of the X;’s equals zero is given by [Eq. (2.36)] 


P(Y = v-(1 |e —¢? yp =035 


3.43. Let (X, Y, Z) be a trivariate r.v., where _X, Y, and Z are independent uniform r.v.’s 
over (0, 1). Compute P(Z => XY). 


Since _X, Y, Z are independent and uniformly distributed over (0, 1), we have 


Luyz (ts Y= Fy Oy (YZ) = I O<xcLOSeeLOce< 
TF x res FAT 2 Pe tes l ph pl 
Then PZ= XV) = [ff furl y.ddedvde= Jf, fac dy a 
oH Ss ele OS 
“Solel mearde= fi[I-S lame 


3.44. Let (X, Y, Z) be a trivariate r.v. with joint pdf 


£ te gaye [OO Op Oge0 
Jxyz\*, ¥, 2) = 
| 0 otherwise 


where a, b, c > 0 and & are constants. 

(a) Determine the value of k. 

(b) Find the marginal joint pdf of X and Y. 
(c) Find the marginal pdf of X. 

(d) Are X, Y, and Z independent? 


(a) By Eq. (3.76), 
FSS, Bax ¥.2) de dy deh ff ROO ae dy de 
k 


=] ax b=] pr co 62 
—i fe ae dxf e” dyf tae 1 
a a a abe 


Thus, k = abc. 
(b) By Eq. (3.77), the marginal joint pdf of X and Y is 


> «x . ‘ so No : 7 
Fey (*.Y) =f FyyzQ. y, 2) dz = abe i ge ter tert ee) ae 


= abce —™ f js ee = abe Ot) x>0,y>0 
(c) By Eq. (3.78), the marginal pdf of X is 


h@= fo fifa ya dy de=abe fo fer dy de 
—% ~30 


=abce “ f : ey i, e “dz=ae ™ x>0 

(d) Similarly, we obtain 
hy@ =f of. Syyz (sy, 2) dx dz = be” y>0 
27) =f of. Fryz (X,Y. 2) dx dy =ce™ z>0 


Since fyyAx, y, Z) =f(x) fy) fAz), X, Y, and Z are independent. 


3.45. Show that 


FreylXs ¥. 2) — Fy x, AZ |X, YF yy x | Xf (0) (3.99) 
By definition (3.79), 
Iyyz (X3,952) 
d (z x; y) = Se, 
a fry (% ) 
Hence 
faz Y= fax y Gyr ¥) (3.100) 


Now, by Eq. (3.38), 


fey) = fyyx0 DF) 


Substituting this expression into Eq. (3.100), we obtain 


Fyyz@s¥) 2 = fary y @ [ni Bt ix &Y | x) F(x) 


Special Distributions 


3.46. Derive Eq. (3.87). 


Consider a sequence of n independent multinomial trials. Let A; (i= 1, 2, ..., k) 
be the outcome of a single trial. The r.v. X; is equal to the number of times 4; 
occurs in the 7 trials. If x,, x5, ..., x, are nonnegative integers such that their sum 
equals n, then for such a sequence the probability that A; occurs x; times, i = 1, 2, 
..., A—that is, P(X) = x1, X> =X, ..., X, = x,)—can be obtained by counting the 
number of sequences containing exactly x, A,’s, x7 A»’s, ..., x; A;’s and 
multiplying by p7'p3?:-: p;,*. The total number of such sequences is given by the 
number of ways we could lay out in a row n things, of which x, are of one kind, 
X are of a second kind, ..., x; are of a Ath kind. The number of ways we could 


choose x, positions for the 4,’s is | “ after having put the A,’s in their position, 
| 


the number of ways we could choose positions for the A,’s is 


n—% 
'| and so 
sa 
on. Thus, the total number of sequences with x, A,’s, x»A44’s, ..., x, A,;’s 1s given 
by 


_ n! (n—x,)! (tp rr 
x, {Cn — x; )! x» Gr — x, — x5)! x, 10! 

_ n} 
X, 1X5 !-- x! 


Thus, we obtain 


n! 1% x 
iad... Ree 
Xj !Xo lees xy! 


PX, K_ Us Aas ee AE) = 


3.47. Suppose that a fair die is rolled seven times. Find the probability that 1 and 2 
dots appear twice each; 3, 4, and 5 dots once each; and 6 dots not at all. 


Let (X1, X>, ..., X) be a six-dimensional random vector, where X; denotes the 
number of times 7 dots appear in seven rolls of a fair die. Then (Xj, X>, ..., X6) 1s 
a multinomial r.v. with parameters (7, P1, Po, ...,P6) where p, = (i = 1,2, ...6). 
Hence, by Eq. (3.87), 


7 2 : ‘lL; ‘l. vie ah 
3 i ies i= /f fy ‘ Fe ee 
Px Pe Bake ey I a ae ie | I | aE Se 
mano él lellelletlalls, 
Wit) 3 ‘ 
- z| ~ =? 22) 0048 
ol & 
3.48. Show that the pmf of a multinomial r.v. given by Eq. (3.87) satisfies the 
condition (3.68); that is, 
Be De ys vy ya Boece ss nya (3.101) 


where the summation is over the set of all nonnegative integers x, X5, ..., X; 


whose sum is 7. 


The multinomial theorem (which is an extension of the binomial theorem) states 
that 


ft | oe i Nutra at? 
ey ay? «een (3.102) 


(@ 1a, lee lay = 


where x, +x, +...+x,=n and 


n _ n! 
ee 
Ky XqeesXp] Ay lXg boos xy! 


is called the multinomial coefficient, and the summation is over the set of all 
nonnegative integers x), x5, ...,.x, whose sum is 7. 


Thus, setting a; = p; in Eq. (3.102), we obtain 


YX: Up x 


A Xy ex po Xp ot) = (Py t py tio TPP = CO) = | 


3.49. Let (X, Y) be a bivariate normal r.v. with its pdf given by Eq. (3.88). 
(a) Find the marginal pdf’s of X and Y. 
(6) Show that X and Y are independent when p = 0. 


(a) By Eq. (3.30), the marginal pdf of X is 
fy (x) = [fav @y) dy 


From Eqs. (3.88) and (3.89), we have 


ee 3 I ] 
fay OY) = rea »| 
2a yoy (Il —p* - 2 
oe aH | iti [x ky | y—ty \, Pas 
\ Oy | Oy Oy, 
Rewriting g(x, y), 
1 |f yep, | i x—py || , f v—py 
aoe taf Ste) of te] 
l Pr \ a y ! \ Oy } u x } 
i ex (eat 
a Sa ee a Be ct 4 | 
(1 po ery” Fy | Gy | 
aod a ce | | 
#\ Oy } | px | ! 
Then f(s + f Fen pte ely 
NV 2TU y eV 2z0,0— po) 2 
) 
where a(t) aim = pa ay 
(lp joy Ty 


Comparing the integrand with Eq. (2.71), we see that the integrand is a 
normal pdf with mean zy + p (ay/oy) (x — wy) and variance (1 — p*)oy. Thus, 
the integral must be unity and we obtain 


exp a sli 2 (3.103) 
20x 


Ay(o=s 
Jim, 


In a similar manner, the marginal pdf of Y is 


AO) 
— J2ma, 


—{ Sone 2 
exp ea (3,104) 


” < 
ay 


(b) When p = 0, Eq. (3.88) reduces to 


/ 2 2 
oe Ss ] | ewe: pictanpee | 
fale y= 7G OP | =) 2 “) 

“TO yOy 2 | Ox Oy | 


2 a: 
=p. ok ne ae | X— fy ] i agai il | v— py 
2C y 2\ Oy } |V2a0y 2\ Oy 


= fy (OAM 


Hence, X and Y are independent. 
3.50. Show that p in Eq. (3.88) is the correlation coefficient of X and Y. 


By Eggs. (3.50) and (3.53), the correlation coefficient of X and Y is 


a 


| = 
¥-Uy 


Gas X-u, | 


Oy 


Per ee 


(3.105) 


| 


where fyy(x, y) is given by Eq. (3.88). By making a change in variables v = (x — 
Uy)/oy and w = (vy — Ly)/oy, we can write Eq. (3.105) as 


(Ty 


| Fey (x.y) dk dy 


1 2 3 
} — exp] -———_("* — 2pyw + w*)] dv dw 
oe 1 "Tall =p ee | 2p) 
v (v ~ py 299 
= 59 OXP dvie dw 
ie 7) a mAs *$2a(1 rs” "- 21 po = |e 


The term in the curly braces is identified as the mean of V = M(pw; 1 — 0”), and 
SO 


p Pm le (ow) -wri2 Iw = p YW 
— — , eC re A! | } 
XY f On p it. 


The last integral is the variance of W = N(0; 1), and so it is equal to 1 and we 
obtain pyy = p. 


3.51. Let (X, Y) be a bivariate normal r.v. with its pdf given by Eq. (3.88). Determine 
E(Y |x). 


By Eq. (3.58), 


EY|2)= fo wy x6] x) dy (3.106) 
where 
. fey (X,Y) . 
jvO| x)= (3.107) 
Fyjx® | fy) 


Substituting Eqs. (3.88) and (3.103) into Eq. (3.107), and after some cancellation 
and rearranging, we obtain 


er 212 ae ¥ 
V2roy(1—- poy l 20y (U-p") 


Jy) Pe +) = :_— p OV (x _ ty) _ My | | 
x | 


which is equal to the pdf of a normal r.v. with mean Ly + p(oy/o,.)(x — [,) and 
variance (1 — p”)o;. Thus, we get 


E(Y| xy=uyt+p 


ao 
Lix— Uy) (3.108) 
Oy 


Note that when X and Y are independent, then p = 0 and E(Y |x) = py = E(”). 


3.52. The joint pdf of a bivariate r.v. (X, Y) is given by 


ge he 1 | ee 2 
dey 239) = eae eae ere eee <0 N, yo 
2V30 2 
(a) Find the means of X and Y. 
(b) Find the variances of X and Y. 
(c) Find the correlation coefficient of X and Y. 


We note that the term in the bracket of the exponential is a quadratic 
function of x and y, and hence fy (x, y) could be a pdf of a bivariate normal 


3.538 


r.v. If so, then it is simpler to solve equations for the various parameters. 
Now, the given joint pdf of (X, Y) can be expressed as 


1 1 
x,y) =—" exp | -—-—q(x, y 
fry (X.Y) 2 Jan P| 5 »| 
where 
2). % 
q(x, y)= ae — REY FF- 29+ YF 
2 9 . 4 
=r ae Dery ied 


Comparing the above expressions with Eqs. (3.88) and (3.89), we see that 
yx, y) is the pdf of a bivariate normal rv. with ny = 0, hy = 1, and the 


following equations: 


2n0 yoy J1— pe =2/32 


2, 3 
(1 p*)ox =~ p* yoy => 
2 m= 
oyoy(1—p’) 3 
Solving for of, of, and p, we get 
_— aed 
Oy =O0y =2 and > 


Hence, 
(a) The mean of X is zero, and the mean of Y is 1. 
(b) The variance of both_X and Y is 2. 
(c) The correlation coefficient of X and Y is 


Nl 


Consider a bivariate r.v. (X, Y), where _X and Y denote the horizontal and vertical 
miss distances, respectively, from a target when a bullet is fired. Assume that X 
and Y are independent and that the probability of the bullet landing on any point 
of the xy plane depends only on the distance of the point from the target. Show 
that (X, Y) is a bivariate normal r.v. 


From the assumption, we have 


fy Axe ¥) = fy f(y) = ga? | ¥?) (3.109) 


for some function g. Differentiating Eq. (3.109) with respect to x, we have 
FOYE OF = 2x9? + y7) (3.110) 
Dividing Eq. (3.110) by Eq. (3.109) and rearranging, we get 


fy) _ a? ly) (3.111) 
2afy (x) g(x +y*) 


Note that the left-hand side of Eq. (3.111) depends only on x, whereas the right- 
hand side depends only on x’ + y’; thus, 


Lx@) _., (3.112) 
Xf y (x) 


where c is a constant. Rewriting Eq.(3.112) as 


Fx) 
ty (x ) 


4 — 
—cx or tin fyi y)]—ex (3.1134) 
dx 
and integrating both sides, we get 


In fy (X) soa +a or fy(x)= ke /? 


where a and k are constants. By the properties of a pdf, the constant c must be 
negative, and setting c = — 1/o*, we have 


fx(x) = ke 120") 
Thus, by Eq. (2.71), X = N(0; 6”) and 


~x7/2(207) 


1 
toe 


In a similar way, we can obtain the pdf of Y as 


| -y?207) 
y= e* 
WO" Dao 
Since X and Y are independent, the joint pdf of (X, Y) is 


2, 2 2 
(x +y" /(207 ) 


e 


| 


Fxy ny) = fy fy Y) = 
220 


which indicates that (X, Y) is a bivariate normal r.v. 


3.54. Let (X), X, ..., X,,) be an n-variate normal r.v. with its joint pdf given by Eq. 
(3.92). Show that if the covariance of X; and_X; is zero for i #/, that is, 


Cov(X,,X;)= 0, =47% oJ (3.114) 
J 0 i / j 


then_X}, X>, ..., X,, are independent. 


From Eq. (3.94) with Eq. (3.114), the covariance matrix K becomes 


7, 0 0 
0 og 
K C2 (3.115) 
0 0 ie 


It therefore follows that 


| let K _ = (hy +-G, =| [4 (3.116) 
iT 
and 
Py 
> OO 0 
a; 
—] 0 J ee 0 
Ko = Os (3,117) 
0 0 a 
O, 


Then we can write 


(x wy Kx w=3(- — ) 


ts, WM, (x) Ore x j an { 


Now Eq. (3.119) can be rewritten as 
Ix ie: GarregR = [] ao 


where 


oii 120;7) 


Sx, i) = Ties 


Thus we conclude that _X,, X5,..., X,, are independent. 


SUPPLEMENTARY PROBLEMS 


3.55. Consider an experiment of tossing a fair coin three times. Let (X, Y) be a 


(3.118) 
(3.119) 
(3.120) 


bivariate r.v., where _X denotes the number of heads on the first two tosses and Y 


denotes the number of heads on the third toss. 
(a) Find the range of X. 

(6) Find the range of Y. 

(c) Find the range of (X, Y). 


(d) Find (i) PX < 2, Y< 1); Gi) P(X< 1, Y< 1); and (iii) P(X < 0, Y< 0). 


3.56. Let Fy,(x, y) be a joint cdf of a bivariate r.v. (X, Y). Show that 


P(X >a, Y>0e)=1-F,(@ — Fy) + Fy lao 


where F',{x) and F'(y) are marginal cdf’s of X and Y, respectively. 


3.57. Let the joint pmf of (X, Y) be given by 


e. 


K(x; Fy) x, =1,2,3; yj =1,2 


0 otherwise 


Pxy Qj, Liye) = 


where k is a constant. 
(a) Find the value of k. 
(6) Find the marginal pmf’s of X and Y. 


3.58. The joint pdf of (X, Y) is given by 


ket?) x >0, y>0 


hy (x)= | 


0 otherwise 


where k is a constant. 
(a) Find the value of k. 
(b) Find P(X> 1, Y< 1), P(X < Y), and P(X < 2). 


3.59. Let (X, Y) be a bivariate r.v., where_X is a uniform r.v. over (0, 0.2) and Y is an 
exponential r.v. with parameter 5, and X and Y are independent. 
(a) Find the joint pdf of (X, Y). 
(b) Find P(Y <.X). 


3.60. Let the joint pdf of (X, Y) be given by 


fae 


. x>0,y>0 
Fee 


0 otherwise 


(a) Show that fy(x, y) satisfies Eq.(3.26). 
(6) Find the marginal pdf’s of X and Y. 


3.61. The joint pdf of (X, Y) is given by 


A, 


. 


[kx (4—y) F< YO 24,0 << 2 


4 ? y) — . 
fey) 0 otherwise 
where k is a constant. 
(a) Find the value of k. 
(b) Find the marginal pdf’s of X and Y. 
3.62. The joint pdf of CX, Y) is given by 
platy 2. 
, xye oe x>0,y>0 
FE 7(X, y) — = 9) 
F otherwise 
(a) Find the marginal pdf’s of X and Y. 
(b) Are X and Y independent? 
3.63. The joint pdf of (X, Y) is given by 
OP geie 
fo % = e X= OV 0 
0 otherwise 


(a) Are X and Y independent ? 
(6) Find the conditional pdf’s of X and Y. 


3.64. The joint pdf of (X, Y) is given by 


y O<xsy 


_ je 
fyy(%y) = 0 


otherwise 


(a) Find the conditional pdf’s of Y, given that X= x. 
(6) Find the conditional cdf’s of Y, given that Y= x. 


3.65. 


3.66. 


3.67. 


3.68. 


3.69. 


Consider the bivariate r.v. (X, Y) of Prob. 3.14. 
(a) Find the mean and the variance of X. 

(b) Find the mean and the variance of Y. 

(c) Find the covariance of X and Y. 

(d) Find the correlation coefficient of X and Y. 


Consider a bivariate r.v. (X, Y) with joint pdf 


ee ee a 4 
(x° +y° (2,57) 


; 1 
Sy Y) = 5 ze 


27th 


—% e Xs y << fea] 


Find P[(X, Y) x2 + y? <a’]. 


Let (X, Y) be a bivariate normal r.v., where XY and Y each have zero mean and 
variance o7, and the correlation coefficient of XY and Y is p. Find the joint pdf of 


(x, Y). 


The joint pdf of a bivariate r.v. (X, Y) is given by 


l 2, 5 9 
ry (X, VY) = exp | — —O" — xy ty 
Ixy @,Y) Ba p 3° yty") 


(a) Find the means and variances of X and Y. 
(b) Find the correlation coefficient of X and Y. 


Let (X, Y, Z) be a trivariate r.v., where X, Y, and Z are independent and each has 
a uniform distribution over (0, 1). Compute P(X¥ > Y> Z). 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


Sune 


(a) R,={0,1,2} 
(bh) R= 10,13 
Ce) Ry — (0,0), (0,1), 1.09, CL, 1). (2,0), (2. 1} 


(a) PES 2A Se PKS Tk = = und (ip PEUX =O, Y= OF : 


3.56. Hint: Set x, =a, y, =c, and x, = y> = in Eq. (3.95) and use Eqs. (3.13) and 


che 


Be) 


ec 


3.60. 


3.61. 


3.62. 


(3.14). 
Bn L 
@ k=5 
(B) px) = 72x, +3) x=1,2,3 
P= O@+3y) y= 1,2 
(a) K=2 
l 


(b) PIX > 1, YO 1) eT! — eo" = 0.318: XN 33 P(X = 2) — 1 — 7? = 0.865 


Me) O<x<02,y>0 
(a) Dy n=10 : 2 ? 


otherwise 


(b) P(Y SX) =e"! ~ 0.368 


(b) fy(x)=e* x>0 
1 
0 eer 5 > 0 
ie Cyr . 
_ Ss 
(a) k = 
oe 2 3 
(b) pin=Se[s-39 <2 
[2] Za-yy° O<y<2 
32.) 24 
f= : ha-vfs-4y) 2<y<4 
32. }'3 8 
0 otherwise 
(a) fy) =xe"? x>0 
fo=yer" y>0 


(b) Yes 


3.63. 


3.64. 


3.65. 


3.66. 


3.67. 


3.68. 


3.69. 


(a) Yes 
() fyyely=e*  — x>0 


fy) x |x) =e” y= 0 
(a) fn xO |e =e? yx 
®) FyG|D=1. _— 
y| x)= 
¥|x™ l-e» y=x 
77 
E(X) =— a X) =— 
(a) E(X)= 1(X) = 3a 
(b) E(Y)=— =, Var(¥) == 
l 
Cow(X, Y) = -— — 
(c) CowX,Y) 16 
(@) p=—0.025 
1 — @ @7/Q07) 
I Lx? —2oxyty 
(<9) —_ aa BS ———— 
Pry Or. 2n07 (1-7)? | 2 od--') 
(a) py=H,=0 a =p; = 1 
(b) p=4 
L 
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CHAPTER 4 


Functions of Random Variables, Expectation, Limit 
Theorems 


4.1 Introduction 


In this chapter we study a few basic concepts of functions of random variables and 
investigate the expected value of a certain function of a random variable. The 
techniques of moment generating functions and characteristic functions, which are 
very useful in some applications, are presented. Finally, the laws of large numbers 
and the central limit theorem, which is one of the most remarkable results in 
probability theory, are discussed. 


4.2 Functions of One Random Variable 
A. Random Variable g(x): 
Given ar.v. X and a function g(x), the expression 
Y= g(x) (4.1) 

defines a new r.v. Y. With y a given number, we denote Dy the subset of R, (range 
of X) such that g(x) < y. Then 

Y=y= k= 3] = CED) (4,2) 
where (X € Dy) is the event consisting of all outcomes ¢ such that the point X(C) € 


Dy. Hence, 


FW) = PY =v) — Pie) = yl — POE D,) (4.3) 


If X is a continuous r.v. with pdf fx(x), then 


F,(¥) = he : fy(x) dx (4.4) 


B. Determination of fy(y) from fx(x): 


Let X be a continuous r.v. with pdf f(x). If the transformation y = g(x) is one-to- 
one and has the inverse transformation 


x= 2°! (y) = hy) (4.5) 


then the pdf of Y is given by (Prob. 4.2) 


fy Y= fx] = fy LAO] 


diy) (4.6) 
ay 


Note that if g(x) is a continuous monotonic increasing or decreasing function, 
then the transformation y = g(x) is one-to-one. If the transformation y = g(x) is not 
one-to-one, fy(Y) is obtained as follows: Denoting the real roots of y = g(x) by x, 


that is, 


¥ S265 “= 20) = (4.7) 
then 
fy Gy) 
Ty (y) ) — ‘AQ 
» [a | (4.8) 
aa g” (xp) 


where g'(x) is the derivative of g(x). 


4.3 Functions of Two Random Variables 


A. One Function of Two Random Variables: 


Given two r.v.’s X and Y and a function g(x, y), the expression 


Z— 9(X,Y) (4.9) 


defines a new r.v. Z. With z a given number, we denote D7 the subset of Ryy 
[range of (X, Y)] such that g(x, y) <z. Then 


(72D = [eX Y= I= (YEN} (4.10) 


where {(X, Y) € D7} is the event consisting of all outcomes ¢ such that point 
{X(C), Y(C)} € Dz. Hence, 


Fz) = PZ = 2) = PletX, ¥) =o] = PAX, YE D} 4.11) 


If X and Y are continuous r.v.’s with joint pdf fyy(x, y), then 


Frtz)= [ff fev x.y) dy dy (4,12) 


D, 


B. Two Functions of Two Random Variables: 
Given two r.v.’s X and Y and two functions g(x, y) and h(x, y), the expression 


= 70 BE) W WX, ¥) (4.13) 


defines two new r.v.’s Z and W. With z and w two given numbers, we denote D7w 
the subset of Ryy [range of (X, Y)] such that g(x, y) < z and h(x, y) < w. Then 


Z=2,W=w) = (pik, Yas WY, Yew] =e, YER} (4.14) 


where {(X, Y) € Dzw} 1s the event consisting of all outcomes ¢ such that point 
{X(Q), Y(C)} © Dzw- Hence, 


Ey(c.w)- P22 2.W Sw) Pl KS AK YS 
PUK VE Dae 


(4.13) 


In the continuous case, we have 


f, iW (Z, HW’) aoe ff Fe ¥ (x, ¥) ly dy (4 | 6) 
Dry 


Determination of f7Ww(z, w) from fxy(x y): 


Let X and Y be two continuous r.v.’s with joint pdf fxy(x, y). If the transformation 


z= g(x, ¥) 


is one-to-one and has the inverse transformation 


w= Atx, y) 


(4.17) 


X¥ = qz, W) y = rz, W) (4.18) 
then the joint pdf of Z and W is given by 
Fay (2: W) = fy, IQ, ¥) | (4.19) 
where x = g(z, w), v=7(z, w), and 
ax oy = |dx dy 
Ix, y= ah ah aw aw (4.20) 
ax oy jax ay 
which is the Jacobian of the transformation (4.17). If we define 
J(Z.W)= % ni =|" ee (4.21) 
ar dt ay ody ; 
dz awl az dw 
then 
| J(z,w) =|J(x,y)| : (4.22) 
and Eq. (4.19) can be expressed as 
Fay. W) = fy lg w). re, wil [J (z. w) | (4.23) 
4.4 Functions of m Random Variables 
A. One Function of m Random Variables: 
Given n r.v.’s Xj, ..., X, and a function g(x), ..., x,), the expression 
FS A pans (4.24) 


defines anewr.v. Y. Then 
(¥ = ¥) — [g(x sxnaedil = 4] = ih caee _A,) ie Dd, (4.25) 
and 


Py) = Pig)... XJ2¥) = PUK... XE DY (4.26) 


j7 bee 


where Dy is the subset of the range of (Xj, ..., X,,) such that g(x), ...,x,) <y. IfX4, 
..., Xp are continuous r.v.’s with joint pdf fy, x (y, ---s Xp), then 


he) = {¢° Ii etre Jade, + dk, (4.27) 


B. n Functions of n Random Variables: 


When the joint pdf of 7 r.v.’s Xj, ..., X, 18 given and we want to determine the joint 
pdf of nm rv.’s Y;,..., Y,, where 


Y, = 9(X),.--. Xn) 


(4,28) 
Y= 8, (Ayes A) 


the approach is the same as for two r.v.’ s. We shall assume that the transformation 


Y= 2) Avec %y) 
(4,29) 
eet SO Ce, 
is one-to-one and has the inverse transformation 
UHV, saves y,) 
: (4.30) 
Xn =A Opes Vad 
Then the joint pdf of Y,, ..., Y,, 18 given by 
Pie ween del = Tec gllipemere yh Lal hyn eer MOE (4.31) 


where 
dg 1 re) 2 
ax, sad ax, 
Tiree | = & & (4,32) 
aL, -- dg, 
OX AX, 


which is the Jacobian of the transformation (4.29). 


4.5 Expectation 
A. Expectation of a Function of One Random Variable: 
The expectation of Y= g(X) is given by 


5 oly De ly) (discrete cease | 


EF) = El p(X] = (4.53) 


| giv), od (vonlinupus case) 


B. Expectation of a Function of More Than One Random Variable: 


Let Xj, ..., X, ben r.v.’s, and let Y= g(X\j, ..., X,,). Then 


ey 2,800 ty) fy, (tps td (ihsarele cunt] 
BU) El efx) (444i 


4 ee seal ok aly ay FCT HLS. Lae} 
i Pk | i wpe aye Tt " | u \ ! 


C. Linearity Property of Expectation: 


Note that the expectation operation is linear (Prob. 4.45), and we have 


ry Sa X } | Se F(X;) (4.35) 


where a;’s are constants. If r.v.’s X and Y are independent, then we have (Prob. 
4.47) 


ElgXOomY)) — ElgXYIEAY)] (4.36) 


The relation (4.36) can be generalized to a mutually independent set of n r.v.’s Xj, 
ere’, Oh 


n 


E [ [scm =| [£lg,(X)1 (4.37) 
i=l i=! 


D. Conditional Expectation as a Random Variable: 


In Sec. 3.8 we defined the conditional expectation of Y given X= x, E(Y |x) [Eq. 
(3.58)], which is, in general, a function of x, say H(x). Now A(X) is a function of 
the r.v. X; that is, 


A(X) = EY | X) (4.38) 


Thus, E(Y |_X) is a function of the rv. X. Note that E(Y |_X) has the following 
property (Prob. 4.38): 


E(E(Y | X)| = EY) (4.39) 


E. Jensen’s Inequality: 


A twice differentiable real-valued function g(x) is said to be convex if g"(x) = 0 for 
all x; similarly, it is said to be concave if g"(x) < 0. 

Examples of convex functions include x’, x < e*, x log x (x > 0), and so on. 
Examples of concave functions include log x and \/x (x = 0)- If g(x) is convex, 
then h(x) = — g(x) is concave and vice versa. 

Jensen’s Inequality: 

If g(x) is a convex function, then 


Elg@)| > g(EIXD (4.40) 


provided that the expectations exist and are finite. 
Equation (4.40) is known as Jensen s inequality (for proof see Prob. 4.50). 


F. Cauchy-Schwarz Inequality: 
Assume that E(X?), E(Y*) < 00, then 


E(|X¥ |)s (F(X?) FY’) (4.41) 


Equation (4.41) is known as Cauchy-Schwarz inequality (for proof see Prob. 4.51). 


4.6 Probability Generating Functions 


A. Definition: 


Let X be a nonnegative integer-valued discrete r.v. with pmf py(x). The probability 
generating function (or z-transform) of X is defined by 


Gy =EC™ I= ¥ py x)" (4.42) 


r=) 


where z is a variable. 
Note that 


= 5 pty (x) z| = B fy (x= E for |z|< (4.43) 


v=0 at 


|Gy (2) 


B. Properties of Gx(z): 


Differentiating Eq. (4.42) repeatedly, we have 


Gy(o= »} py 2") = py (D—2 py Q zt 3 py Be +-- (4.44) 
aol 

Gy (Z)- y x(x—l)py (ys'* =2p, (2)- 3-2 p, Giz- (4.45) 
ant 


In general, 


Te 


aq] ~ * —-H " a ii 
Gy iz) y xa Tete asl py ¥ Aly Gric’" (4.46) 
Teil =F a ! 


Then, we have 


(Ey pt = Pe = 0) = 6.0) (4471 

(2) fy WHI PLA =1)=— a (1) (448) 
at 

(3) 2) = GL (dau 

(4) EDM -1iX -2)-- Wt FP Gh (450i 


One of the useful properties of the probability generating function is that it 
turns a sum into product. 


E(.(% Xn) )= E(.% 2X ) = E(.% } E(,% } (4.51) 


Suppose that X), X5, ..., X, are independent nonnegative integer-valued r.v.’s, and 
let Y=X, + X, eee eae. On Then 


n 
(5) Gy ta)=| | lay 2) 453 
| 


Note that property (2) indicates that the probability generating function 
determines the distribution. Property (4) is known as the nth factorial moment. 
Setting m = 2 in Eq. (4.50) and by Eq. (4.35), we have 


EX 1) = BAA) = Be) Bay = 6) (4.53 
Thus, using Eq. (4.49) 
EX?) = GY) + G"(1) (4.54) 
Using Eq. (2.31), we obtain 
Var(X) = G,' (1) + 6," CL) — 16,0 (DY (4.55) 
C. Lemma for Probability Generating Function: 
Lemma 4.1: If two nonnegative integer-valued discrete r.v.’s have the same 


probability generating functions, then they must have the same distribution. 


4.7 Moment Generating Functions 


A. Definition: 
The moment generating function of ar.v. X is defined by 


ye" Py a) (iliscrele Case } 
My (t}= Fle y= 4! (4.56) 
| cs e” f(a) dx (CONUIMUGUS Cuse} 


where ¢ is a real variable. Note that My(t) may not exist for all rv.’s X. In general, 
My(t) will exist only for those values of ¢ for which the sum or integral of Eq. 
(4.56) converges absolutely. Suppose that M(t) exists. If we express e formally 
and take expectation, then 


My() = B(@*)= FI =UX + OXY tot (I 


‘ ‘ (4.57) 
ST HHO) +S BU) bet BO) + 
and the kth moment of X is given by 
m, = E(X*) = M,“(0) k =1,2.,... (4.58) 
where 
(k) a 
My (O)= m7 My (1) (4.59) 


i=0 


Note that by substituting z by e’, we can obtain the moment generating function 
for a nonnegative integer-valued discrete r.v. from the probability generating 
function. 


B. Joint Moment Generating Function: 


The joint moment generating function My y(t, t,) of two r.v.’s X and Y is defined 
by 


My At : t,) — ble* + 12¥0 (4.60) 


where ¢, and ¢, are real variables. Proceeding as we did in Eq. (4.57), we can 
establish that 


_— oe DE fn 
My @ .= FM) = FF a F(x*¥") (461) 
i! 


Po 1 a | eta 


and the (k, ) joint moment of X and Y is given by 


m,, = HXF¥") = My, 0) (4.62) 
where 
. atte 
M yy" (0,0) = =—— Myy (t,t) (4.63) 
PHM y= i= 


In a similar fashion, we can define the joint moment generating function of 
rv.’s X1, ..., X, by 


nat) = [evr ory (4.64) 
: 

from which the various moments can be computed. If_Xj, ..., X,, are independent, 
then 


M gn, UOpecehy) = ELE Hated] = ee. 
* “ | (4.65) 
yeu? Fie"! Ay } AD: eee uv ) == iW x (i } as My te } 


C. Lemmas for Moment Generating Functions: 


Two important lemmas concerning moment generating functions are stated in the 
following: 


Lemma 4.2: If two r.v.’s have the same moment generating functions, then they 
must have the same distribution. 

Lemma 4.3: Given cdf’s F(x), F\(x), F>(x), ... with corresponding moment 
generating functions M(t), M(t), M>(0), ..., then F,(x) — F(x) if (0 — M(). 


4.8 Characteristic Functions 


A. Definition: 
The characteristic function of ar.v. X is defined by 


| ve 8 yy (5) (discrete case) 
Dy (oy = He =| | (4.66) 
i} gt Te (xdaly (CONTINUOUS case) 


where @ is a real variable and j; = \/—]. Note that ‘?,(@) is obtained by replacing 
tin My(t) by jo if My(t) exists. Thus, the characteristic function has all the 
properties of the moment generating function. Now 


JOX; 5 
| Xie e) 


i 


gir Py (X;) | - >, Dx (%;) =1< 20 
for the discrete case and 


Wel=|f ee ywadkcle [eb RWdr l= fyodr=l< 2 
x aaa 5 ie X _ dX 


for the continuous case. Thus, the characteristic function Py(@) is always defined 
even if the moment generating function My(f) is not (Prob. 4.76). Note that Px(@) 


of Eq. (4.66) for the continuous case is the Fourier transform (with the sign of / 
reversed) of fx(x). Because of this fact, if ¥y(@) is known, fy(x) can be found 


from the inverse Fourier transform; that is, 


fy(ay= J = Wy (nye doy (4.67) 
ar 


B. Joint Characteristic Functions: 


The joint characteristic function Pyy(@1, >) of two r.v.’s X and Y is defined by 


y! Vy {era tty | Fei" Xo ank ‘] 


) vel Poe Dey i : yy j (cliscrete casei 
ad 14.68) 


x mo stip se . , 
[is [ ae mere cdg vbdcdy — (conlinuaus case) 


where @, and @, are real variables. 


The expression of Eq. (4.68) for the continuous case is recognized as the two- 
dimensional Fourier transform (with the sign of 7 reversed) of fy y(x, y). Thus, 


from the inverse Fourier transform, we have 


! re a] =3 it toh sy 1 
(2° Jia J ag Pay ey de OO day, dan (4.69) 


fey OLY= 


From Eqs. (4.66) and (4.68), we see that 
Wy (a) = yy (9,0) Wy (ao) = yy (0, an) (4.70) 


which are called marginal characteristic functions. 
Similarly, we can define the joint characteristic function of 7 r.v.’s_Xj, ..., X, by 


Ty, -Xy (@, wees @, }= E| efi’ tr tai Ny ] (4.71) 


As in the case of the moment generating function, if X), ..., X, are independent, 
then 


ne 


Wy. (Oy oes O,)— Py (@))-- By (o,) (4.72) 


C. Lemmas for Characteristic Functions: 


As with the moment generating function, we have the following two lemmas: 


Lemma 4.4: A distribution function is uniquely determined by its characteristic 
function. 

Lemma 4.5: Given cdf’s F(x), F\(x), F>(x), ... with corresponding characteristic 
functions ¥(@), ¥)(@), ‘Y(@), ..., then F(x) — F(x) at points of continuity of 
F(x) if and only if ¥,,(@) — ‘P(@) for every a. 


4.9 The Laws of Large Numbers and the Central Limit Theorem 


A. The Weak Law of Large Numbers: 


Let X;, ..., X, be a sequence of independent, identically distributed r.v.’s each with 
a finite mean E(X;) = u. Let 


R, = SX, =A t+ X) (4.73) 
ora} i 


Then, for any ¢ > 0, 


lim P( | Xp |> a =() (4.74) 
n : ; : 


>oO 


Equation (4.74) is known as the weak law of large numbers, and X F is known as 
the sample mean. 


B. The Strong Law of Large Numbers: 


Let Xj, ..., X, be a sequence of independent, identically distributed r.v.’s each with 
a finite mean E(X;) = uw. Then, for any ¢ > 0, 


’ = 
P| lim |X, —u 
? 
‘ 


tun = ‘| — (0 (4.75) 
where X , is the sample mean defined by Eq. (4.73). Equation (4.75) is known as 
the strong law of large numbers. 

Notice the important difference between Eqs. (4.74) and (4.75). Equation (4.74) 
tells us how a sequence of probabilities converges, and Eq. (4.75) tells us how the 
sequence of r.v.’s behaves in the limit. The strong law of large numbers tells us 
that the sequence ( X 2) is converging to the constant i. 


C. The Central Limit Theorem: 


The central limit theorem is one of the most remarkable results in probability 
theory. There are many versions of this theorem. In its simplest form, the central 
limit theorem is stated as follows: 


Let Xj, ..., X, be a sequence of independent, identically distributed r.v.’s each with 


mean pt and variance o7. Let 


7 = Xi forscta OM _ AL pe (4.76) 
ni oe oe Ge hy 
Ovi a/v 


where X ., 1s defined by Eq. (4.73). Then the distribution of Z, tends to the standard 
normal as n — o; that is, 


lim Z, =N(O;1) (4.77) 


n+ % 
or 


lim ff, (2)= lim P(Z, = 2)=(z) jeep ne 
fi =3 fn . aw om ( " ) : (4.78) 


where © (z) is the cdf of a standard normal r.v. [Eq. (2.73)]. Thus, the central limit 


theorem tells us that for large n, the distribution of the sum S, =X, + ... +X, 1s 


approximately normal regardless of the form of the distribution of the individual 
X;’s. Notice how much stronger this theorem is than the laws of large numbers. In 


practice, whenever an observed r.v. is known to be a sum of a large number of 
r.v.’s, then the central limit theorem gives us some justification for assuming that 
this sum is normally distributed. 


SOLVED PROBLEMS 


Functions of One Random Variable 


4.1. IfXis Mu; 07), then show that Z = (X— w)/o is a standard normal r.v.; that 
is, (0; 1). 


The cdf of Z is 


Hn(= P= 2)= P| X— fi 


= | = P(X =za + pw) 
oO rs 


-_ 20+ He | —(x- uy i(2a7) 7 
= ee dx 
eae paviogh 
V 20 


By the change of variable y = (x — w)/o (that is, x = oy + 1), we obtain 


1 42, 


e > dy 


F,@)= PZ 2)= f° 


and 


_OF@) 1 _ on 
fz) dz Jn 


which indicates that Z = N(0; 1). 

4.2. Verify Eq. (4.6). 
Assume that y = g(x) is a continuous monotonically increasing function [Fig. 
4-1(a)]. Since y = g(x) is monotonically increasing, it has an inverse that we 


denote by x = g !(y) = h(y). Then 


F a 


yg 


ppkceeneses yooh 


0 a iy) ) Xy—Fily | 


i bl 


Fig. 4-1 
F,.(y) — P(Y = y)— PIX = hO)1— Fy [1 (4.79) 


and 


4.3. 


fy (y) =— < FeO) = ra [h(y)}} 
Applying the chain rule of differentiation to this expression yields 
d 
fy Y) = fxlAQ)] — Avy) 
dy 


which can be written as 


: asus ; . 
fy) = fy (x) x =h(y) (4.80) 


If y = g(x) is monotonically decreasing [Fig. 4-1(5)], then 


Fy) = PUY Sy = PLX > AQ =L  FyLAy)] (4.81) 
Thus, 
/ 
f(y) - Fy) — fray = x —h(¥) (4.82) 
(ty 


ay 


In Eq. (4.82), since y = g(x) is monotonically decreasing, dy/dx (and dx/dy) is 
negative. Combining Eqs. (4.80) and (4.82), we obtain 


fr) = f(x) te ) 


dx 
=| - Fy lh] —— 


which is valid for any continuous monotonic (increasing or decreasing) 
function y = g(x). 


Let X be ar.v. with cdf F(x) and pdf f(x). Let Y= aX + b, where a and b 
are real constants and a # 0. 
(a) Find the cdf of Y in terms of Fy(x). 


(6) Find the pdf of Y in terms of fx(x). 


(a) Ifa>0, then [Fig. 4-2(a)] 


yearth 


yrart+o 


a> 0 a«0 
ial (3) 
Fig. 4-2 
3 yh hyp’ 
FOV=PY Sy) = POX +h2y=P| x 27" |=F, | (4.83) 
\ a | , | 
If a < 0, then [Fig. 4-2(d)] 
FeO) -— PY Sv) — P(aX +bhs y)— P(aX = yh) 
Oh : = ‘if 
= rx = 7” | (since a <Q, note the change in the inequality sign) 
a 
f 4 
=yopixes | 
\ a ? 
f ‘ ; ‘ 
= -P| wee said J-e[x-2=* i | 
\ a ; a | 
i . ‘ f : % 
y— 7 v—b 
=1—ty|- J+ yee | (4.84) 
t @ g \ a 


Note that if X is continuous, then PLX = (y — b)/a] = 0, and 


fy-b _ 
Fy (¥) =1- Fy| = “<0 (4.85) 
\ a 


(6b) From Fig. 4-2, we see that y = g(x) = ax + b is a continuous 
monotonically increasing (a > 0) or decreasing (a < 0) function. Its 
inverse is x = g !(y) = h(y) = (vy — ba, and dx/dy = 1/a. Thus, by Eq. 
(4.6), 

y= alg Xx | ae | 
Jal" | 


Note that Eq. (4.86) can also be obtained by differentiating Eqs. (4.83) 
and (4.85) with respect to y. 


(4.86) 
a 


4.4. Let Y=aX-+ b. Determine the pdf of Y, if X is a uniform r.v. over (0, 1). 


The pdf of X is [Eq. (2.56)] 


_ 1 O0<x<1 
Ix@) = 0 otherwise 
Then by Eq. (4.86), we get 
( 1 | vER 
| to ¢ f) other ise 


The range Ry is found as follows: From Fig. 4-3, we see that 


4.5. 


Fig. 4-3 
For a> 0: R= tb <y at b} 
For a= 0: Ry= fi at+b<y<b} 


Let Y= aX + b. Show that if ¥ = M(u; 0”), then Y= Mau + 5; a’ 0), and 
find the values of a and b so that Y = N(0; 1). 


Since Y = N(u; 07), by Eq. (2.71), 


_ l 1. 5 
fy) = aes? |- 52" _ | 


Hence, by Eq. (4.86), 


— | | | bmg 
ire 4 XPT —u 
J2a\alo | 2a" |, a (4.88) 
| ( : 
= exp s—|¥ (aulb 
V22|alo ms pat ) 


which is the pdf of N(au + b; a0”). Hence, Y= Mau + b; ao’). Next, let au 
+ b=0 and a’o? = 1, from which we get a = 1/o and b = —wo. Thus, Y= (XY 
— )/o is N(O; 1) (see Prob. 4.1). 


4.6. Let Xbeary. with pdf fy(x). Let Y =X. Find the pdf of Y. 


The event A = (Y<y) in Ry is equivalent to the event 
B=(-Vy <X< Vy) in Rx (Fig. 4-4). If y < 0, then 


y 


Xx 
Fig. 4-4 
Fy(y) = PY = y) =0 
and fy(y) = 0. Ify > 0, then 
Fely) PUY 2 yy P{— fo any at a Fy( vl Fy (—'¥) (4.89) 


and 


Ct eg fn ¢ry_ apy _o ) 
PHO BPH BF dy) eB)” 3 Fel) + xv) 


Thus, 


lh (Vr) +f Jr] y>0 


fir | 2 ¥ (4.90) 


0 y=0 


Alternative Solution: 


4.7. 


4.8. 


If y < 0, then the equation y = x” has no real solutions; hence, fy(y) = 0. If y 
> 0, then y = x” has two solutions, x, = Vy and x, = —Vy. Now, y = g(x) = 


x? and g'(x) = 2x. Hence, by Eq. (4.8), 


Tlie (V9) + fe(- Jy) y>0 


2 
Q y<0 


fy(y) = 


Let Y =X. Find the pdf of Yif X= N(0; 1). 


Since X = N(0; 1) 


1 - 
fa) =e 12 


Since f(x) is an even function, by Eq. (4.90), we obtain 


tte (Sy \= ——_¢ ye yor 


Ta Se a 
Jy (n= vi} y any 


4 yet) 


(4.91) 


Let Y =X. Find and sketch the pdf of Y if X is a uniform r.v. over (—1, 2). 


The pdf of X is [Eq. (2.56)] [Fig. 4-5(a)] 


Ty 


ls (BI 


4.9. 


Fig. 4-5 


- -1<x<2 
fx) =33 
0 otherwise 


In this case, the range of Y is (0, 4), and we must be careful in applying Eq. 


(4.90). When 0<y<1, both Vy and—v/ y are in Rx = (~l, 2), and by Eq. 
(4.90), 


ee ee 
wor=se[t+4| 3,/y 


When 1 <y<4, VV y is in Ry =(-l, 2) but — VV y <—-], and by Eq. 
(4.90), 


i Fowl J 
‘ (yy) = = —+0]= cae 
i 2J/y 3 } 6) 
co O<y<l 
a 
; | — 
Thus. KhO)=)\— l<y<4 
64) ¥ 
otherwise 


which is sketched in Fig. 4-5(b). 
Let Y= e*. Find the pdf of Yif X is a uniform rv. over (0, 1). 


The pdf of X is 


0<x<l 


1 
fer=1y 


otherwise 


The cdf of Y is 


4.10. 


Fy(y) = PY = y) = P(e* =y)= P(X =Iny) 


ra J a y (X)dx = J ‘ dx =In y l<y<e 


Thus, 
{ l 


y 


= _ 
O=LHO=Ly=—  l<y<e (4.92) 


Alternative Solution: 


The function y = g(x) = e* is a continuous monotonically increasing function. 
Its inverse is x = g '(y) = A(y) = In y. Thus, by Eq. (4.6), we obtain 


ral 
a rr. |S O0<In y<1 
fy) = f(t y)| In y= — fn y) = 4 y 
be - lo otherwise 
or 
1 
— l<y<e 
fy=yy 
0) otherwise 
Let Y= e*. Find the pdf of Yif X= N (u; 07). 


The pdf of X is [Eq. (2.71)] 


k= 


(x —o <x < 90 


| | 1 
exp|—-—; 
af 250 O 20° 


Thus, using the technique shown in the alternative solution of Prob. 4.9, we 
obtain 


Md l 2 ' 
feiy Pell ¥} exp th 1 {bP yost ae (4.93) 


Voy 2 TF 


4.11. 


4.12. 


4.13. 


Note that _X = In Y is the normal r.v.; hence, the r.v. Y is called the /og-normal 
rv. 


Let X be ar.v. with pdf fy (x). Let Y= 1/X. Find the pdf of Y in terms of 
fx). 


We see that the inverse of y = 1/x is x = 1/y and dx/dy = —1/). Thus, by Eq. 
(4.6) 


i i 1 
a, “aa SF, f 
- y | 7 (4.94) 


Let Y= 1/X and_X be a Cauchy r.v. with parameter a. Show that Y is also a 
Cauchy r.v. with parameter 1/a. 


From Prob. 2.83, we have 


alan 
9 9 
a +x 


—-an<aix<ow 


fy(x) = 


By Eq. (4.94) 


it aha —  Cfla)ja 
yati(ifyy (lay +y? 


which indicates that Y is also a Cauchy r.v. with parameter 1/a. 
Let Y= tan_X. Find the pdf of Yif_X is a uniform r.v. over (—2/2, 1/2). 


The cdf of X is [Eq. (2.57)] 


0 x=—a/2 
| : was ; 
Fy (x) =4—(eF 2/2) iP SE SZ 
I 
l R= /2 


Now 


4.14. 


4.15. 


FyQ)-— PY s¥)—PtanX = y)-P(X= tan | y) 
eyes a8 se Me Ve thee op 
= Fy(tan 'y=—| tan hy + =— +—Tfan ty —oO<l y<io 
T | 2 2D # 


Then the pdf of Y is given by 


d l 
fM=LRW)=7—,  -@<y<e 
dy a(1+ y’) 


Note that the r.v. Y is a Cauchy r.v. with parameter 1. 


Let X be a continuous r.v. with the cdf Fy(x). Let Y= Fy(X). Show that Y is 
a uniform r.v. over (0, 1). 


Notice from the properties of a cdf that y = Fy(x) is a monotonically 
nondecreasing function. Since 0 < Fy(x) < 1 for all real x, y takes on values 
only on the interval (0, 1). Using Eq. (4.80) (Prob. 4.2), we have 


1 r ] fy (x) 
y= fel) ——_ = fo) —___—_-=* ~ 
i Le eae fe 


Hence, Y is a uniform r.v. over (0, 1). 


Let Y be a uniform r.v. over (0, 1). Let F(x) be a function which has the 
properties of the cdf of a continuous r.v. with F(a) = 0, F(b) = 1, and F(x) 
strictly increasing for a < x < b, where a and b could be 2 and o, 
respectively. Let X= F"!(Y). Show that the cdf of X is F(x). 


Fx) = P&S x) = PLIF-'(Y) =3] 


Since F(x) is strictly increasing, F'(Y) < x is equivalent to Y< F(x), and 
hence 


F(x) = P(X Sx) = P[Y S$ F(w)] 


Now Y is a uniform r.v. over (0, 1), and by Eq. (2.57), 


Fyy)=P(Y¥Sy)=y O<y<! 
and accordingly, 
F Ax) = PX S x) = PLY = FQ] = FQ) 0< FQ) <1 
Note that this problem is the converse of Prob. 4.14. 


4.16. Let X be a continuous r.v. with the pdf 


ha x>0 


_ ée 
in| x<0 


Find the transformation Y = g(X) such that the pdf of Y is 


1 


— C= y= 1 
fr) =4 2 
0 otherwise 
The cdf of X is 
x “eSd—e _[l-e* x>0 
Fy(x)= fy(§) d& = jf 7 = 
: J. : ‘ x< 


Then from the result of Prob. 4.14, the rv. Z= 1 —e~ is uniformly 
distributed over (0, 1). Similarly, the cdf of Y is 


y | 
nore lam [i O<y<I 
y : 


0 otherwise 


and the rv. w=. y is uniformly distributed over (0, 1). Thus, by setting Z 
= W, the required transformation is Y= (1 — e*). 


Functions of Two Random Variables 


4.17. 


4.18. 


Consider Z = X + Y. Show that if X and Y are independent Poisson r.v.’s with 
parameters (4, and A,, respectively, then Z is also a Poisson r.v. with 


parameter A, + A. 
We can write the event 
(X+Y=n)=|J(X=i,Y =n-i 
i=0 
where events (YX =i, Y=n-—i),i=0, 1, ...,, are disjoint. Since X and Y are 


independent, by Eqs. (1.62) and (2.48), we have 


P(Z=ny=P(X+¥ =n= pS P(X =i1,¥ =n-D= y P(X =1) P(¥Y =n—-i) 
i=0 i=D0 


44 nme fr af nme 
th de —Ay de e 2 A; = —@ IA “aS a4 ? _ 
i! (n—i)! ilin— i)! 


r=] ° ores ‘ :=0 


| , . 
fy i- "—i 


which indicates that Z= X + Y is a Poisson r.v. with A, + Ao. 
Consider two r.v.’s X and Y with joint pdf fyy (x, y). Let Z=X+ Y. 


(a) Determine the pdf of Z. 
(b) Determine the pdf of Z if X and Y are independent. 


(a) The range R, of Z corresponding to the event (Z < z) = (X + Y<z) 1s the 
set of points (x, y) which lie on and to the left of the line z = x + y (Fig. 4-6). 
Thus, we have 


SS 


= x 


YWwey Yy RY 
GY Mi, 23 Yy Ye 
Fig. 4-6 
Haya Pix Yess Po if ‘on bl (rds de (4.95) 
Then 
Kid= Byr=f" f Whe coc: sy] (4.96) 
ve dz” ee —— 
=f fa(e2 ade 

(b) If X and Y are independent, then Eq. (4.96) reduces to 

F)= fo OA (cx) dx (4.97) 


The integral on the right-hand side of Eq. (4.97) is known as a 
convolution of fx(z) and fy(z). Since the convolution is commutative, 


Eq. (4.97) can also be written as 
Lld= fo fyi ls — dy (4.98) 


4.19. Using Eqs. (4.19) and (3.30), redo Prob. 4.18(a); that is, find the pdf of Z = 
Aa; 


Let Z=X+ Y and W=.X. The transformation z = x + y, w =x has the inverse 
transformation x = w, y =z — w, and 


Oz 4 
ox oy 1 
M9) = = 
w aw 


ox oy 
By Eq. (4.19), we obtain 
ta. =F 2 w) 
Hence, by Eq. (3.30), we get 
IZla= f - Jaw (zw) dw = f . fer (w,z— w) dw = fl. fry (avzn x) de 


4.20. Suppose that _X and Y are independent standard normal r.v.’s. Find the pdf of 
Z=X+Y. 


The pdf’s of X and Y are 


gree _ 
f@=z—e fyQy= 


Then, by Eq. (4.97), we have 


7) 7 
+} = +19 


‘ : oo - oe ‘ rs —x"f2 
fa [fx fy@— de = fee" 


Now, 2? — 22x + 2x2 = (V2 x-2/V2 9 + 2/2, and we have 


(J2x—2i2)/2 

db 

ae allele al 
= 1 eels 


"on gets 2H 


ge 2 du 


with the change of variables , = \/2 x — z/ \/2. Since the integrand is the 
pdf of (0; 1), the integral is equal to unity, and we get 


2, | een ee) 
274 l -22/2(/2) 


: l _ 
(z)=——>$—e * € 
A aa nwa 


which is the pdf of (0; 2). Thus, Z is a normal r.v. with zero mean and 
variance 2. 


4.21. Let X and Y be independent uniform r.v.’s over (0, 1). Find and sketch the 
pdf of Z=X-+ Y. 


Since X and Y are independent, we have 


. oe 1 O0<x<10<y<I 
Say 9) = Sy @) fy OY) = 


0 otherwise 
The range of Z is (0, 2), and 


F(Q=PX+VS0= ff fy@wdedy= ff ded 


X+y=z X+ySzZ 


If0<z<1 [Fig. 4-7(a)], 


f(z) 


8) 1 2 
(cl 


Fig. 4-7 


F(z) = ff dx dy = shaded area = = 


xty<z 


and 


If 1 <z<2 [Fig. 4-7(b)], 


(2-27 


y= ff ax dy = shaded area = | - —— 


x+y ez = 


— fAD==F,Q)=2-2 
az 
z Q:<f.2] 
Hence, fz(2)— 42-2 l<ccx2 
0 otherwise 


which is sketched in Fig. 4-7(c). Note that the same result can be obtained by 
the convolution of fx(z) and fy(z). 


4.22. Let X and Y be independent exponential r.v.’s with common parameter i and 
let Z=X+ Y. Find f(z). 


From Eq. (2.60) we have 
jp@ahe Ax x>0, fy) =A€ ay y>0 
In order to apply Eq. (4.97) we need to rewrite fy(x) and fy(y) as 
fine au (x) — xc) oe, [wlsac Ay uly) oe) y sl oe 


where u(&) is a unit step function defined as 


3 f! E>0 
HS? ~ Lo E<0 


(4.99) 
Now by Eq. (4.97) we have 
f,@= +. he Pu(xdae *@ u(z —x) dx 


Using Eq. (4.99), we have 


1 O0<x<z 
u(x)u(z -2)=| 


O otherwise 


4.23. 


Thus, 


fran e* f : dx = 17 ze“ u(z) 


Note that _X and Y are gamma r.v.’s with parameter (1, 1) and Z is a gamma 
r.v. with parameter (2, 2) (see Prob. 4.23). 


Let X and Y be independent gamma r.v.’s with respective parameters (a, i) 
and (f, A). Show that Z= X + Y is also a gamma tr.v. with parameters (a + B, 
i). 


From Eq. (2.65), 


a le mCi r>0 
fx Qo = Ta) : 
l =O 
he (Ax)P — 
fro) = T(p) : 
0 y<0 


The range of Z is (0, ©), and using Eq. (4.97), we have 


By | 2, —Ax Od dod —Kz-X) wp 
fl) re do he* (Ax) fe Me A(z — xf? ' dx 


Ac +f 


= Hig fF el = p=. f. 
Tarts) * Te ga © as | x 


By the change of variable w = x/z, we have 


ee -1 
fz(z) = ———~-e “*z* BOE ye ayy hy 
P(a)0(p) ® 
= ke tt B-1 
where k is a constant which does not depend on z. The value of k is 
determined as follows: Using Eq. (2.22) and definition (2.66) of the gamma 
function, we have 


[fale de ak foe zh de 
= ashe e *v*" FP" dy (Az=v) 
= _* T(a+ B)=1 
Hence, k = A°*8/T(a + B) and 


7 at —Ax 63 \at Bl 
Are ood patp-1 _ Ae (Az)**4 


Fe) eee a (a+ B) 


z>0 
which indicates that Z is a gamma r.v. with parameters (a + B, A). 


4.24. Let X and Y be two r.v.’s with joint pdf fx y (x, y). and let Z =X — Y. 


(a) Find f- (z). 
(b) Find f7(z) if X and Y are independent. 


(a) From Eq. (4.12) and Fig. 4-8 we have 


y 


\ x-y=zZz 


Fig. 4-8 


Fy(2)=P(X-¥SO=f of fave. yae dy 


Then 


yn &bz(z) = Nh py-z , . 
rt le ara 7 1 vid ty: 
fr (2) tls le —t | Ae f ae I HF (a ¥) ax| dy 


(4.100) 
= fo fav tz. yay 
(b) If_X and Y are independent, by Eq. (3.32), Eq. (4.100) reduces to 
f(z) = foe ¥Yta/fo)dy (4.101) 


which is the convolution of fy(—z) with fy (2). 


4.25. Consider two r.v.’s X and Y with joint pdf fyy(x, y). Determine the pdf of Z 
= XY. 


Let Z = XY and W =X. The transformation z = xy, w = x has the inverse 
transformation x = w, y = z/w, and 


ax dp 
eel 
= OZ Ww 
J Z,W)= = = - |=-—-— 
( dy oy too Ww 
a awl |” © 
Thus, by Eq. (4.23), we obtain 
l f 2) 
fay (eo) =| Fey wi | (4.102) 
Ww , 
and the marginal pdf of Z is 
w 1 f . 
fo(z) — —| fyy w=] dw } 3 
Iz) ig (ae a's (4.103) 


4.26. Let X and Y be independent uniform r.v.’s over (0, 1). Find the pdf of Z = 
XY, 


We have 


4.27. 


1 
Fxy (XY) = ( 


The range of Z is (0, 1). Then 


or 


By Eq. (4.103), 


1] 
f= f'+ aw 


Thus, 


h0or=| 


=—Inz 


0223 LoS rel 


otherwise 


0<w<1,0< 


otherwise 
O<z<w<l 


otherwise 


ziw<xl 


O< z=] 


—-Inz 0<z<1l 


otherwise 


Consider two r.v.’s X and Y with joint pdf fyy(x, vy). Determine the pdf of Z 


= X/Y. 


Let Z = .X/Y and W = Y. The transformation z = x/y, w = y has the inverse 


transformation x = zw, y = w, and 


ax 
az 
dy 
az 


I (Z, w)= 


Thus, by Eq. (4.23), we obtain 


ihe Wi 


[lw] Sf, zw, w) 


Ox 
aw|_|w z 
ay| |O 1 
aw 


_ 


4.104) 


and the marginal pdf of Z is 
(a= fl w | fy (cw, w) dw 4.105) 
4.28. Let X and Y be independent standard normal r.v.’s. Find the pdf of Z = X7Y. 
Since X and Y are independent, using Eq. (4.105), we have 
fD= f “| w| fy (zw) fy (w) dw = f _ Ww Is: eo It V2 hy 
1 


1 oe uses 0 a ek BNF 
=—fe er ee dw —— f we Utz v2 ay 
2° 9 20° 


—-xNn<Iz7=a 


which is the pdf of a Cauchy r.v. with parameter 1. 


4.29. Let X and Y be two r.v.’s with joint pdf fxy (x, y) and joint cdf Fyy(x, y). Let 
Z = max(X, Y). 


(a) Find the cdf of Z. 
(b) Find the pdf of Z if X and Y are independent. 


(a) The region in the xy plane corresponding to the event {max(X, Y) <z} 
is shown as the shaded area in Fig. 4-9. Then 


y 


z q Z, 2) 
ZZ 
_ 


Fig. 4-9 
FQ) - P22) -—- PANS 2 Y= 2) F,,(z, 2) (4.106) 
(b) If X and Y are independent, then 
F7@) = Fy @Fy @) 
and differentiating with respect to z gives 
fz&) — f, OF, @ - F, Of, @ (4.107) 


4.30. Let X and Y be two rv.’s with joint pdf fyy(x, y) and joint cdf Fy y(x, y). Let 
W=min(X, Y). 


(a) Find the cdf of W. 
(b) Find the pdf of Wif_X and Y are independent. 


(a) The region in the xy plane corresponding to the event {min(X, Y) < w} 
is shown as the shaded area in Fig. 4-10. Then 


Thus, 
P(W = w) — PAX = Ww) = we} 
=P(Xjw)tPYHaw)—-PUXeaw)N(Y Sw} 
Fy OW) — Fy) + B00) — Fyy eh) (4,108) 


(b) If X and Y are independent, then 


Fy i) = FOE On) — FGF AW) 


and differentiating with respect to w gives 
Fy (h) = fel) — FeO) — fede Wy Hd — Fe eo) 


— Fw — FO + fy Oe — Fe) (4.109) 


4.31. Let X and Y be two rv.’s with joint pdf fyy (x, y). Let Z= X? + Y°. Find 
f (2). 


As shown in Fig. 4-11, Dz (X? + Y* <z) represents the area of a circle with 
radius 1/7. 


Fig. 4-11 


Hence, by Eq. (4.12) 


: Vz [z-y? 
F2(Z)= Sa S; fxy (x, y) dx dy 


x=—z—y" . 


and 


ee We i 
a sil re alot fap 


Pefay= 7 
=f* ae fey tape - i hope sha (4 
= = fed 
JE i a 
A fi a sl fn (ens? lt Lal yay yl 
4.32. Let X and Y be independent normal r.v.’s with [ty = [ty = 0 and 
0,7 = of = o. Let Z=X? + Y’. Find f(z). 
Since X and Y are independent, from Eqs. (3.32) and (2.71) we have 
l 3. 9 
. : iota ] = a(x -y) 
Tay (4) = fx COLO) =>—ze (4.111) 
at 
and 
t./ iS. | 
——lz-y' +3) ——52 
: 210° 210" 
. } 5 ] (c-y a ) ] ia — 
hy (- a yy : y) = e 2g = . e 2a 
: Qos? 20" 
Thus, using Eq. (4.110), we obtain 
= f 1 4 1 
fy4 w=" i ? | = €  Qsetr? : ‘Tdy=te dr? ic | ah 
JZ‘ yo-Js 2 ee y I Ion } m7* a i y2 : 


Let y = Vz sin 0. Then | _ 
= Vz(1 —sin? 6)= Vzcos @anddy = Vzcos 0dé@ 


Py 4 
Vz-y= 


and 
x/2 Jz cos 6 _a 
9 


Sr = So —— : 


4.33. 


4.34, 


Hence, 


- I Se f 
tz (z) =—~e «? z >) (4 id 2) 
20° 


— 


which indicates that Z is an exponential r.v. with parameter 1/(2 07). 


Let X and Y be two r.v.’s with joint pdf yy (x, y). Let 
| a | -1¥ 
R= JX" —Y¥ @—tan ¥ (4,113) 


Find fre (1, 8) in terms of fyy (x, y). 


We assume that 7 > 0 and 0 < 8 < 22. With this assumption, the 
transformation 


has the inverse transformation 


x=rcos @ y=rsin 0 
Since 
ax ax 
Fat leh cia 
ar 30 
by Eq. (4.23) we obtain 
Frglt, & — rf, fr cos 8, rsin 9) (4.114) 


A voltage V is a function of time ¢t and is given by 


Vit) = Xcos on | Ysin an (4.115) 


in which @ is a constant angular frequency and _Y = Y= N (0; 0”) and they 


are independent. 


(a) Show that V(t) may be written as 


Vi) = R cos (@t — O) 


(4.116) 


(b) Find the pdf’s of r.v.’s R and © and show that R and © are independent. 


(a) We have 


V(t) =X cos aft +Y sin wt 


=X Ue cos oft FP —————— rare sin ct 


=X * (cos © cos wr + sin O sin wr) 


= R cos(wt — 0) 


=X? +Y? and @=tan™ é 
be 
Where 
which is the transformation (4.113). 
(b) Since X= Y= N (0; 07) and they are independent, we have 


e —(x? +y")i(207) 


fyy(%,y) = 


Thus, using Eq. (4.114), we get 


r r 4 333 
Frat 8) — ¢ fey (r cos @, 7 sin 8) — CEI? 
210" 


Now 


Ry # “oT : r eee i Pe lar FoF ya Y 
fylFi= fo" fag (t,@1dh= — eT} f Sapa ED 
‘ 2m” o 


Tr 


— 


fa tth = im trey iF 


Oo ae 


(os | 
ifr = 
u ip: 


(4.117) 
(4118) 
(4.119) 


and fre(”, 9) = fr”) fe(@); hence, R and © are independent. 


Note that R is a Rayleigh r.v. (Prob. 2.26), and © is a uniform r.v. over 
(0, 27). 


Functions of N Random Variables 


4.35. Let X, Y, and Z be independent standard normal r.v.’s. Let W= (X? + Y7 + 
Z’)'?, Find the pdf of W. 


We have 
Seyzl*. x, z) _ he (hf (y Wy (z) = = e! Le yt 
(250)° 7 
and Fy (w) = P(W Sw) = P(X? +¥*4+7% = 0) 

= l —~(x* +x? +77 ¥2 ak ate fee 

= f fl if —e ax dy dz 
a (2a) ° 
hp 


where Rw = {(x, y, z): x7 +? + z* < w’). Using spherical coordinates (Fig. 4- 
12), we have 


Fig. 4-12 Spherical coordinates. 


rtyte=r? 


dx dy dz =r? sin @ dr d0 dy 


and 
ane TO ee l 2% 4 “ p24 3 . sues! 
Fy Ww) my f ; ip a er sm Adr dé dy 
1 «Don ., w py 2 
ree ely J, sin O dA J ile aa | 
l ps WE Wi porn 
= 5 (2X2) f re" dr 
a7 ‘ . 
(2a) 0 
Thus, the pdf of W is 
song t . jw é i wo 
Sy (Wy= - fy (w) =+ \ st 
lo w= 


(4.120) 


(4.121) 


4.36. Let X;, ..., X, be n independent r.v.’s each with the identical pdf f(x). Let Z 


= max(Xj, ..., X,,). Find the pdf of Z. 


The probability P(z < Z < z + dz) is equal to the probability that one of the 
r.v.’s falls in (z, z + dz) and all others are less than z. The probability that one 
of X; (i= 1, ..., 2) falls in (z, z + dz) and all others are all less than z is 


= n-1 
fedae{ f° fear] 


Since there are n ways of choosing the variables to be maximum, we have 


ft 


flz =f) ih if (xddx)  =af(eMF(or 


When n = 2, Eq. (4.122) reduces to 


J) - 26) J, fds -2f@)F 


(4.122) 


(4.123) 


which is the same as Eq. (4.107) (Prob. 4.29) with fx(z) = fy(z) = f(z) and 
Fy(Z) = Fy(z) = F(). 


4.37. Let Xj, ..., X, be n independent r.v.’s each with the identical pdf f(x). Let W 
5 min(X}), ..., X,). Find the pdf of W. 


The probability P(w < W< w + dw) is equal to the probability that one of the 
r.v.’s falls in (w, w + dw) and all others are greater than w. The probability 
that one of X; (i= 1, ..., 1) falls in (w, w + dw) and all others are greater than 
w 1s 


_ n—-1 
fow) dw{ f* f(a) dr] 
WwW 
Since there are n ways of choosing the variables to be minimum, we have 
er yao 1 
fowl — af be} i. F(dr| oaftanl Fey (4.124) 


When n = 2, Eq. (4.124) reduces to 


fy 0) =2f(w) Pte jay =2f (wil — Fov)| (4.125) 
w 


which is the same as Eq. (4.109) (Prob. 4.30) with fy(w) = fy(w) = f(w) and 
Fx(w) = Fy(w) = Fw). 


4.38. Let X,, i= 1, ...,, be m independent gamma r.v.’s with respective 
parameters (a;, ),i= 1, ...,n. Let 


Y=X,+--+X,=)X; 
i=1 


Show that Y is also a gamma r.v. with parameters (27 _, @,, A). 
We prove this proposition by induction. Let us assume that the proposition 1s 


true for n = k; that is, 


4.39. 


4.40. 


k 
Z=X,+--+X, =) X; 


i=] 


k 
is a gamma r.v. with parameters (f, 4) | » a, | 


Let 


k+1 
W=Z4+ X= dX; 


i=1 


Then, by the result of Prob. 4.23, we see that Wis a gamma tr.v. with 


parameters (6 + a, , ,, A) = (247 / a,, A). Hence, the proposition is true for n 


=k-+ 1. Next, by the result of Prob. 4.23, the proposition is true for n = 2. 
Thus, we conclude that the proposition is true for any n > 2. 
Let Xj, ..., X, be m independent exponential r.v.’s each with parameter A. Let 
n 
Y=X,+--+X, =X, 
i=l 


Show that Y is a gamma r.v. with parameters (n, i). 


We note that an exponential r.v. with parameter i is a gamma r.v. with 
parameters (1, A). Thus, from the result of Prob. 4.38 and setting a; = 1, we 


conclude that Y is a gamma r.v. with parameters (n, A). 


Let Z, ..., Z, be n independent standard normal r.v.’s. Let 
YaZitet+Z= > Zi 
i=l 


Find the pdf of Y. 


Let Y, = Z?. Then by Eq. (4.91) (Prob. 4.7), the pdf of Y; is 


1 ~y 
e y/2 


y>0 
fy) = 4 N29 
0 y<0 
Now, using Eq. (2.96), we can rewrite 
T ye A. pat 2 12-1 
—e- Wy pe —e “(y/2 
I a (y/2) af (y/2) 
2 j 
x) Va r( 3 


and we recognize the above as the pdf of a gamma r.v. with parameters G, 5 


[Eq. (2.65)]. Thus, by the result of Prob. 4.38, we conclude that Y is the : 
gamma r.v. with parameters (n/2, 5) and 
[1 


Wl eg yi-l 


_ en Ftye 
fyOo-4 I'{ar/2) 2° Tei?) 


lo ¥<0 


¥-70 (4,126) 


When n is an even integer, I'(n/2) = [(n/2) — 1]!, whereas when n is odd, 
I'(n/2) can be obtained from I'(a) = (a — 1) I'(a — 1) [Eq. (2.97)] and 
T (4) = Vx [Eq. (2.99)]. 


Note that Equation (4.126) is referred to as the chi-square (7) density 
function with n degrees of freedom, and Y is known as the chi-square (x?) 
r.v. with n degrees of freedom. It is important to recognize that the sum of 
the squares of n independent standard normal r.v.’s is a chi-square r.v. with n 
degrees of freedom. The chi-square distribution plays an important role in 
statistical analysis. 


4.41. Let_X,, X5, and_X3 be independent standard normal r.v.’s. Let 


pe ay PP Ry Ny 
y= Ay 
i= XX 


Determine the joint pdf of Y,, Y>, and Y3. 


Let 
y, ~X, +4, +x 
a, (4.127) 
¥o = XT 


By Eq. (4.32), the Jacobian of transformation (4.127) is 


tr i if 
IGisXe.%g0=|1 —1' VOl=3 
0 1 -il 


Thus, solving the system (4.127), we get 
l 
x = 3% + 2y. + y3) 
l 
X= 301 — Y2 + y3) 


1 
X3 = 3M" — Vy — 2y3) 


Then by Eq. (4.31), we obtain 


q i 44: me 3 —7 — 4 \ 
a eee ee Sey, omaikhoe sey 
Ji tors [vag ¥y) os pas a 


(4,128) 


3 3 } 


Since X1, X5, and_X3 are independent, 


eyes LEE EY 
(4,2 +%,2 x22 


3 
Fix xpx, (41 X22 %3) = [LA (x;) = 
i=] 


| 
Too) ae 
/2 
Hence, 


I e 10M Y2+¥3)/2 


Fryers (¥5¥2+ 3) = (2m)? 


where 


q Paee 
fics eS y, — 2y, F95 ¥,7— 2, 72s 
Gy; : ¥9 c ¥3) = 1 Z 3 | + l < z of | oo EE 
~ & 3 yA 3 }  \ 3 
l 2 2 2 
=—Vo F—¥.2 H—YV,2 tT — Poa 
ca a OC A a 


Expectation 


4.42. Let XY be a uniform rv. over (0, 1) and Y= e*. 
(a) Find E(Y) by using fy(y). 
(b) Find E(¥) by using fix(x). 


(a) From Eq. (4.92) (Prob. 4.9), 


1 
- I<y<e 


fro) =3 y 


0) otherwise 
EW) =f" why dy = fody=e-1 
oc I 


Hence, 
(b) The pdf of X is 


1 0<x<l 


0 otherwise 


fy(x) -| 
Then, by Eg, (4.33), 


E(Y)= fe fx) dé = if. edx =e! 
4.43. Let Y=aX +b, where a and bd are constants. Show that 
(a) 


E(Y) = ElaX + b) = aki(X) — b (4.129) 


(0) 
Var(Y) = Var(aX — b) = a? Var(X) (4.130) 


We verify for the continuous case. The proof for the discrete case is similar. 
(a) By Eq. (4.33), 


E(Y) = E(aX + b)= f° (ax +b) f(a) dx 
=af xfe(x)dx+b [ fx(x) dx =ak(X) +b 
(b) Using Eq. (4.129), we have 


Var(Y) = Var(aX + b) = E{(aX + b —[aE(X)+ b])°} 
= Ela?| X — E(X)P} =a E{[X — E(X)P} =a? Var(X) 


4.44, Verify Eq. (4.39). 
Using Eqs. (3.58) and (3.38), we have 
ELEY |X))— f° EW |ayiya) da — f a fv [a ds | fad ds 
_ Di @ Suv (x ¥) eee a im | a - - : | 
= yo fyi x) dx dy = ¥ fy (4, ¥) dx | dy 
SSE peodear= f° o[L ho 
=f" Wy )dy = ALY] 


4.45. Let Z=aX + bY, where a and Dd are constants. Show that 
E(Z) = E(ax — bY) = aE(X) + bE(Y) (4.131) 


We verify for the continuous case. The proof for the discrete case is similar. 


E(Z)= E(aX + bY)= "f° (ax + by) fy, y) de dy 
=a" fafer(x. yak dy + bf” J * Mer. y) di dy 
=af- lf. fyy(a, y)dy] dx +b f +bf OS irae. y) dx] dy 
=af _xfy(x) de +b [vf (9) dy = abk(X) + BEY) 


Note that Eq. (4.131) (the linearity of £) can be easily extended to n r.v.’s: 


‘(3 ¥ aX; (4.132) 


4.46. Let Y=aX+ b. 
(a) Find the covariance of X and Y. 
(b) Find the correlation coefficient of X and Y. 


(a) By Eq. (4.131), we have 


E(XY) = E[X(aX + b)] = aE(X*) + bE(X) 
E(Y) = E(aX +b) =aE(X)+b 
Thus, the covariance of X and Y is [Eq. (3.51)] 
CowX,¥)=oy, = ELXY) -— E(X)EY) 
— a@E(X7) | bE(X)  ECOLaE(X) |b] (4.133) 
—a{E(X7) [EQOP} — avy 


(b) By Eq. (4.130), we have oy = |a| ox. Thus, the correlation coefficient of 
X and Y is [Eq. (3.53)] 


py — 2 re | ane (4.134) 
Oxy ox|a aloy a| a aci 0 


4.47. Verify Eq. (4.36). 


Since X and Y are independent, we have 


E[g(X)h(y)] = ik ie f oe a(xyh(y) fyy (x, ¥) dx dy 
=f" f% s@Onod fy fy) dx dy 


= f-, a(x) fy (2) dx f.. h(y) f(y) dy 
= Ele(X)E[AY)] 


The proof for the discrete case is similar. 


4.48. Let X and Y be defined by 


X =cos@ Y = sin © 


where © is arandom variable uniformly distributed over (0, 27). 
(a) Show that X and Y are uncorrelated. 


(b) Show that X and Y are not independent. 


(a) We have 
l ee 
ayy — OA <2 
fyi) =<2i 
lo otherwise 
“ C ‘Ise 55 l oa 
Then, E(X) = fl" xfe( de = f. “cos @ fo(8) d8 = reall cos 0 d#=0 


Similarly, 
’ ] Dios = 
£Y)=— J, sin 0 dd =0 
| 2a | 2a 
‘ = COW 1 eee in? = =f ’ 
E(XY) el cos # sin 6 dé = sin 26 d#=0= E(X)E(Y) 


Thus, by Eq. (3.52), X and Y are uncorrelated. 
(5) 


7 ] in a = 1 Qn * ; 4 
E(X ia cos a0 =— J, (1 + cos 20) dd => 
ba?) =< fo" sin 6dé =f." (1 —cos 26) dé -_ 


Ss eS | 27 EPO. L. I 2x i ae 
E(X°Y 5 Le cos” 6 sin 9d0=—— J, (1—cos 40) d@ =— 
Hence, 


E(X2 Y?) = . #-~ =E(X*E\(Y2) 


Ale 


If X and Y were independent, then by Eq. (4.36), we would have E(X’Y") 
= E(X’)E(Y"). Therefore, X and Y are not independent. 


4.49. Let Xj, ..., X, ben r.v.’s. Show that 


’ 


‘ n 
Y ax ]=¥ Yaa, cox, xp 
i=l 


Far 


(4.135) 
i=1 j=l 


If X), ..., X, are pairwise independent, then 


fon ” 
i s a; X; 
i=] 


a 


= a,” Var(X,) (4.136) 
i=! 


Let 


Y= Sa, 
i=l 


Then by Eq. (4.132), we have 


Var(¥) = E{[Y - E(Y)P}= | aj[X; — E(X; | | 


& | 


a: * by a,a[X; — E(X MX; — E(X; i 
i=1 j=1 


J 


S aa. aj FALX; — E(X LX; — EO) 
1 j=l 


= 
Peed ee 


aaj Cov(X;, X;) 


I 
Ms 
Ma 


ll 
~. 
ll 


If Xj, ..., X, are pairwise independent, then (Prob. 3.22) 


Var(X;) i=j 
0 LF] 


and Eq. (4.135) reduces to 


n 


ar by a;X; 


= Sa? Var(X;) 


i=1 


4.50. Verify Jensen’s inequality (4.40), 
E[g(x)] = g(E[X]) g(x) is a convex function 


Expanding g(x) in a Taylor’s series expansion around pi = E(x), we have 
See — ani) 
B(x) = BCU) + B(WI(X = W) +> BEX — §) 


where € is some value between x and u. If g(x) is convex, then g"(C) > 0 and 
we obtain 


g(x) = g(u) + g'(uy(x — MW) 


Hence, 


e(X) = glu) + g(x — wo (4.137) 
Taking expectation, we get 
El g(x)] = g(u) + g(MyE(X — uw) = g(u) = g(E[X]) 


4.51. Verify Cauchy-Schwarz inequality (4.41), 


E(|X¥|)<VE()E’) 
We have 
F({y| ¥|-|¥ |) - EC ¥ “Yer” — DEY X¥ [ha t+ BLY * (4.138) 
The discriminant of the quadratic in a appearing in Eq. (4.138) must be 
nonpositive because the quadratic cannot have two distinct real roots. 
Therefore, 

(2E[|XY|]) — 4 F(X?) EY”) =0 

and we obtain 

(E[|XY|])* S E(X?) EY?) 


(E[|XY|] <./E(X")E(Y?) 


or 
Probability Generating Functions 


4.52. Let X be a Bernoulli r.v. with parameter p. 
(a) Find the probability generating function Gy(z) of X. 


(b) Find the mean and variance of X. 
(a) From Eq. (2.32) 
p,a)=p'(l- py "= pg" q=l-p x=0,1 


By Eq. (4.42) 


Gy f= s pelvic’ =p t+ py(lis =gtyps g=l-p 
a=0 


(6) Differentiating Eq. (4.139), we have 
G,'(Z) =p Gy" (2) = 
Using Eqs. (4.49) and (4.55), we obtain 


p= EXY= Gl) =p 


o? = Var(X) = G,"(1) + G"(1) — 1G (DP = p — p= pC — p) 


4.53. Let X be a binomial r.v. with parameters (n, p). 


(a) Find the probability generating function Gx(z) of X. 


(b) Find P(X¥= 0) and P(X = 1). 
(c) Find the mean and variance of X. 


(a) From Eq. (2.36) 


Py (x) -( ' |e q=l-—p x=0,1,... 
Xx 


By Eq. (4.41) 

- lie | 
Gyir}= 

om 2 


(b) From Eqs. (4.47) and (4.140) 


PX = 0) = GO) =g*= (1 —p)* 
Differentiating Eq. (4.140), we have 
Gy'(2) — ap(pz + qr! 


Then from Eq. (4.48) 


PX =T) = G0) = npg’ =apll —p)"! 


f= > joare =(pe—gi g=l-p 


(4.139) 


C4140 


(4.141) 


(c) Differentiating Eq. (4.141) again, we have 
G,"(z) — ata — 1) p? (pz — gy? (4,142) 
Thus, using Eqs. (4.49) and (4.53), we obtain 


w= (XX) = G, 1) = apip + gy l= Rp since (pt gy=1, 
o” = Var(X) = G11) +G" (1) — 1G, "(DP = ap + at 1) p? -  p? = np dl —p) 


4.54. Let X), X5, ..., X, be independent Bernoulli r.v.’s with the same parameter p, 
and let Y= X, + X, + ... + X,. Show that Y is a binomial r.v. with parameters 
(n, p). 


By Eq. (4.139) 
Gy (=qt pz g=1-—p,i=1,2,....n 


Now applying property 5 Eq. (4.52), we have 


Gy(z)=[ ]Gx,@) =(q+ pz)" g@=1—p 


i=l 


Comparing with Eq. (4.140), we conclude that Y is a binomial r.v. with 
parameters (n, p). 


4.55. Let X be a geometric r.v. with parameter p. 
(a) Find the probability generating function Gy(z) of X. 


(b) Find the mean and variance of X. 
(a) From Eq. (2.40) we have 
PEAT = per RHE pp g=|l-—p x=1,2 


Then by Eq. (4.42) 


o bs a) 


Gy(Z = Sq Saas ¥' (29)" 


x=! q xml 


zp 
: —_-j|=_? [agl<i 
= 215 a - '- ot | = |<q| 


r=0 


Thus, 


£ = aes a 
1 ig g 


Gy(s)— 


(b) Differentiating Eq. (4.143), we have 


? + <PQ_ P 
zg (l-zgy  (l-agy 


GyRl= 


gone ae —" _ a? ot 
d-gy pp p 
2 es sai! te upon £. Spa 4 
oF Nal) G0) C- (6k 4 
p P p 


4.56. Let X be a negative binomial r.v. with parameters p and k. 
(a) Find the probability generating function Gx(z) of X. 


(b) Find the mean and variance of X. 


(a) From Eq. (2.45) 


‘ae 
fy la) = PN =4) =| * 


\, ! yh Ey 


By Eq. (4.42) 


(4.143) 


(4,144) 


(4.145) 


rl « OM eee 
mils py tl py" g=l-— 5 y=hki bt... 
k-| 


‘ RCA +I). KER TIME AZ) gy 
= p's" |1+ kas t : ae + : (gc) oon (4.146) 
’ if | 
io Ja =u Z 
pet gsr" | | z| 
\ l= ! tf 


(b) Differentiating Eq. (4.146), we have 


ves 4 
mipt—_2 _. 


é 


' ae Re i! 
Gy (2) =k pi! | — 
‘ 4) 


(i ie? + 2qz* 


ox =o] 


Then, 
| k 
= abe AP SS 
(l— q) Pp 


- k-1)+201—p)]_ k(k+1-2p)_ kk +1) 2k 
Gy a= kyt |S 7 ala 5) Se 5} aes 


Gy ()=kp* 


pe P P 
Thus, by Eqs. (4.49) and (4.55), we obtain 


= E(X)=G, (1)= & 
f? 
k(k+1) 2k k? _ k(l—p) 


a a a 


7 e a Li ra 
o* =VariX)=G, (1G, (1G, )P=— | 
2? 


4.57. Let X), X5, ..., X, be independent geometric r.v.’s with the same parameter 
p, and let Y= X,+X,+...+ X,. Show that Y is a negative binomial r.v. with 


parameters p and k. 


By Eq. (4.143) (Prob.4.55) 


z ] 
Gy(z) = P |z|<- q=1-p 
I—z q 


Now applying Eq. (4.52), we have 


tid > k 
o@=[]>s, [22] lz 
i=| \ onbeMy 


Comparing with Eq. (4.146) (Prob. 4.56), we conclude that Y is a negative 
binomial r.v. with parameters p and k. 


4.58. Let_X be a Poisson r.v. with parameter A. 
(a) Find the probability generating function of Gy (z) of X. 


(b) Find the mean and variance of X. 


(a) From Eq. (2.48) 


x 


pases % =D Jess 
x 


By Eq. (4.42) 
a - F - A (Az . iz LIA 
G(2)= FV ey n= Ye 2 ae (4.147) 
a acl i a 


(b) Differentiating Eq. (4.147), we have 
Gy'@ am hee-, Ga REM 
Then, 
G,'() =4,G,"() =# 
Thus, by Eq. (4.49) and Eq. (4.55), we obtain 


w= EX) = GY) =A 
o* = Var (X)= GU) 7G, )-1G, (DP S=AtrAH HHA 


4.59. Let X1, X>, iuiaey 
and let Y= X, + X, + 
nh. 


Using Eq. (4.147) and Eq. (4.52), we have 
62) []ox 6 Te Pe as 


which indicates that Y is a Poisson r.v. with parameter nA. 


to lind 
xen 


Moment Generating Functions 
4.60. Let the moment of a discrete r.v. X be given by 
EX*) = 0.8 Be = gon 5; 


(a) Find the moment generating function of X. 
(6b) Find P(X = 0) and P(X = 1). 
(a) By Eq. (4.57), the moment generating function of X is 
2 t 
My (t)= 1+ EX) + Thal tee EX!) +o 
: 1 k! 


x» 


=14038)/+ +. aie -J-rsos 3S 
~ 


=0.2-08 5 : -=0.2+08¢! 


(6b) By definition (4.56), 


My (= Ble) = Se py (x;) 


Thus, equating Eqs. (4.149) and (4.150), we obtain 


py(0) = P(X = 0) = 0.2 Py(l) = P(X = 1) = 0.8 


X,, be independent Poisson r.v.’s with the same parameter p 
. + X,. Show that Y is a Poisson r.v. with parameter 


(4.148) 


(4.149) 


(4.150) 


4.61. Let X be a Bernoulli r.v. 
(a) Find the moment generating function of X. 
(b) Find the mean and variance of X. 


(a) By definition (4.56) and Eq. (2.32), 
M y(t) =L( e*j= s em pylX;) 


= py (0) + py) = — prt pet (4.151) 


which can also be obtained by substituting z by e’ in Eq. (4.139). 
(b) By Eq. (4.58), 


F(X)=My(O)= pel =p 
E(X*)=M4(O)= pe’ =p 
Hence, Var(X)= E(x?) [EX yr =p PP =pll— Pp) 


4.62. Let_X be a binomial r.v. with parameters (n, p). 
(a) Find the moment generating function of X. 
(b) Find the mean and variance of X. 


(a) By definition (4.56) and Eq. (2.36), and letting g = 1 — p, we get 


. i i ‘ 
My(t)—Ele™)— ¥ et : | pkg! 


x= eer 


ae fee) see A 
=), | ; lle») g*=(q 1 pel) 
k=0' k : . 


(4.152) 


which can also be obtained by substituting z by e’ in Eq. (4.140). 
(b) The first two derivatives of My(t) are 


My (()=n(qt pe'y" | pe! 
Mi (th=n(qt pe')" | pe’ +n(n— Iq t pe’)" *(pe’yP 


Thus, by Eq. (4.58), 


x = E(X) = My(0) = np 
E(X°) =M4(0)=np +n(n— Ip? 
Ox = E(X*) —[E(X)P = np — p) 


Hence, 


4.63. Let X be a Poisson r.v. with parameter A. 
(a) Find the moment generating function of X. 
(b) Find the mean and variance of X. 


(a) By definition (4.56) and Eq. (2.48), 


a ike 
M ,(t)— E(e®) — - 
e ae" . 

a Daa y eee he och ly (4.153) 


which can also be obtained by substituting z by e’ in Eq. (4.147). 
(b) The first two derivatives of My (f) are 


My}, (t)= Ae’ —D 
MY (t) = Ae’) eX) + rete —D 
Thus, by Eq. (4.58), 
E(XX)=M,0)=A = -E(X2) = MP) =A +a 
Var(X) = E(X?) —[FOQOP = 2 4+A-2 =A 
Hence, 


4.64. Let X be an exponential r.v. with parameter A. 
(a) Find the moment generating function of X. 
(b) Find the mean and variance of X. 


(a) By definition (4.56) and Eq. (2.60), 


M y(t)— Ele™ )- S, dee dx 


oo 


_ A tim Ab _ rs a (4.154) 
—e - A>*t 
| i A 0) A am | 
(b) The first two derivatives of My(t) are 
A 2A 
My (tt) =———} —- M¢() =——_~ 
Gwe OE 
Thus, by Eq. (4.58), 
oe , , ] 2 " 2 
EMO) = > E(X JMO) <7 
Var(X) = E(X2)- [EGO =4—-(+ pee 
| | i 
Hence, 
4.65. Let X be a gamma r.v. with parameters (a, A). 
(a) Find the moment generating function of X. 
(b) Find the mean and variance of X. 
(a) By definition (4.56) and Eq. (2.65) 
six a—-14@,,—-Ax -o = ; 
My (t)= Ee) = [7 te = A fg ty 
0 1(a) a) “9 
Let y=(A — t)x, dy = (A — Adx. Then 
“a \ —| { J a 
A af ¥ aye CY \ sh ast 
M,@O=— —— ¢?°—+_ = ye dy 
* Cia) J, At (A-r) (A-r)"T(a) J : ; 
Since Jj y*~ 'e- dy = T(a@) (Eq. (2.66)) we obtain 
xy) 
M,(t)= (4,155) 
Jom( 


(b) The first two derivatives of My(t) are 
M,'() = a dt (A — 2-4 MY! (0) = a(at 1IAtaA—p- et» 


Thus, by Eq. (4.58) 


w= E(X)=M, (=, EVX?)=M,' (0) = SATB 
A A 
Hence, 
> >. ? + 2 J 
o? = Var(X) = F(X?) -[E(X)P = ae ve = = 
v ‘ aA 


4.66. Find the moment generating function of the standard normal r.v. X¥ = N(0; 1), 
and calculate the first three moments of X. 


By definition (4.56) and Eq. (2.71), 
- ‘ : sae 1 —x7/2_ 1% 
M,(t) = E(e*)= ——— ie © el ele 
x( be 


Combining the exponents and completing the square, that is, 


BA ~6- of 


we obtain 


, (7/2 = | —( v—ay"s2 ' 32 = 6% 
My(t)—e f fincons ff dx —e¢ (4.156) 
Te QE 


since the integrand 1s the pdf of M(t; 1). 
Differentiating (7) with respect to ¢ three times, we have 
MY(O= 2M ro=(" + el? MyOD=(0 + 31)e! 2 


Thus, by Eq. (4.58), 


LOO =M(0)=0 EXO) =MyPO=1 LC) =M,%O)=0 


4.67. Let Y=aX-+ b. Let My(t) be the moment generating function of X. Show 
that the moment generating function of Y is given by 


M,(t) — e?M,(at) (4.157) 
By Eqs. (4.56) and (4.129), 


My (t) = E(e” ) = Ele”) 
=e” E(e**) =e”M (at) 


4.68. Find the moment generating function of a normal rv. N(u; 07). 
If X is N(O; 1), then from Prob. 4.1 (or Prob. 4.43), we see that Y= oX + 
uw is N(u; 6). Then by setting a = o and b = p in Eq. (4.157) (Prob. 4.67) 
and using Eq. (4.156), we get 


My) =e My (ot) =e here = gt Oe (4.158) 


4.69. Suppose that r.v. X= N (0;1). Find the moment generating function of Y = 
xe 


By definition (4.56) and Eq. (2.71) 


My(t) = Ele” )= Ele) 


1 \ 
eo 1 _,2j5 1 pe lo -1}e 
= (i eo “d= er” £ dx 
ate V2 V2 Ss 
Using f ee fe we obtain 
28 . 
Gi— 1 law 1 
= vn _, yi-2 (4.159) 


4.70. Let Xj, ..., X, be n independent r.v.’s and let the moment generating function 
of X; be My (t). Let Y= X, + ... + X,. Find the moment generating function 


4.71. 


4.72. 


4.73. 


of Y. 


By definition (4.56), 
My (0) = ble™ ) = bye Tt eee ery 
=k(e%!).--E(e™) (independence) (4.160) 
=, My, (1) e- My (i) 


Show that if Xj, ..., X, are independent Poisson r.v.’s X; having parameter 4,, 
then Y=X, + ... +X, is also a Poisson r.v. with parameter A =), +... + Ay. 


Using Eqs. (4.160) and (4.153), the moment generating function of Y is 


id 
ot rogon pall i 
My(t)h= Le" 1) _ p(2A(e—1) _ pA(e—1) 
i=1 


which is the moment generating function of a Poisson r.v. with parameter i. 
Hence, Y is a Poisson r.v. with parameter A = XA; =A, +... + Ay. 


Note that Prob. 4.17 is a special case for n = 2. 


Show that if.X), ..., X, are independent normal r.v.’s and X, = N(u; o), 
then Y= X, +... +X, 1s also a normal r.v. with mean p= py, +... +p, and 
variance? = 0,27 + --- + 0. 


Using Eqs. (4.160) and (4.158), the moment generating function of Y is 


ae | 2,25) ye Ser2 Ve27 2,2, 
My ()=] emt?) = emi \tt+| Sa; \r /2 oe elite 1/2 


i=l 


which is the moment generating function of a normal r.v. with mean p and 
variance o*. Hence, Y is a normal r.v. with mean p+ pw, +... +p, and 

. a) 9) 2 
variance O = 0,“ +++: + O°. 


Note that Prob. 4.20 is a special case for n = 2 with p; = 0 and o7 = 1. 


Find the moment generating function of a gamma tr.v. Y with parameters (n, 
i). 


From Prob. 4.39, we see that if X), ..., X, are independent exponential r.v.’s, 
each with parameter A, then Y= X, + ... +X, 1s a gamma r.v. with 


parameters (n, 4). Thus, by Eqs. (4.160) and (4.154), the moment generating 
function of Y is 


nf ) ¢ , ya 
Myit)— | : |-| : | (4.161) 
Il A tS] Wa 4 


4.74. Suppose that X), X, ..., X, be independent standard normal r.v.’s and_X; = 
N(0;1). Let Y=Xj7 +. X52 +... +X, 
(a) Find the moment generating function of Y. 
(b) Find the mean and variance of Y. 


(a) By Eqs. (4.159) and (4.160) 


av 
Mi =Ee=[]d-20° =a-3y" (4.162) 
i=] 


(6) Differentiating Eq. (4.162), we obtain 


7 us 


My(=nd-2) 2, My “D=n(n+2,0-29 2 
Thus, by Eq. (4.58) 
RY) = My'(0) = 1 (4.163) 
E(¥*) — M,""(0) — nin + 2) (4.164) 
Hence, 
Var (¥) = E(¥*) — LAQOP = ar + 2) — 1? = 20 (4,165) 


Characteristic Functions 


4.75. The r.v. X can take on the values x, = —1 and x, = +1 with pmf’s py(x,) = 
Px(x>) = 0.5. Determine the characteristic function of X. 


By definition (4.66), the characteristic function of X is 


Wy (@)= 0.5e 2? +0.5e2? = Co +e /”)=cos a 
4.76. Find the characteristic function of a Cauchy r.v. X with parameter a and pdf 
given by 
ea a 
a(x? +a?) 


By direct integration (or from the Table of Fourier transforms in Appendix 
B), we have the following Fourier transform pair: 


2a 


Pad aie 
Ww +a’ 


Now, by the duality property of the Fourier transform, we have the following 
Fourier transform pair: 


2a —a| 


5 5 <> 20 —a| | 
x“t+a 


~o| a 7¢ 


or (by the linearity property of the Fourier transform) 


a —a\ o| 
—__ ese 
n(x? +a’) 


Thus, the characteristic function of X is 
oT a || (4.166) 


Note that the moment generating function of the Cauchy r.v. X does not 
exist, since E(X”) — for n > 2. 


4.77. The characteristic function of a r.v. X is given by 


7 - 1—-|o| |o|<1 
x(@) = 0 |o|>1 


Find the pdf of X. 
From formula (4.67), we obtain the pdf of X as 
fy(x)= > f 7 Wy (ae Fda 


- I 0 4 JX Be -_ Jun | 
=| f°,d tere dot J (l-—woe dw 


| _—" l 
= (2 -e* —¢ *) = —_(1 - cos x) 
20x aX" 
a ay «12 
_ | | sin(x/2) BS ER 
25) x2 i 


4.78. Find the characteristic function of a normal r.v. X= N(; 0”). 


The moment generating function of M(u; 67) is [Eq. (4.158)] 


M,(t) = eutt o7t?/2 


Thus, the characteristic function of M(\1; 0”) is obtained by setting t = jw in 


M,(t); that is, 


a ee 
‘ —- Ow 2 
Vy (m) et lars? sae (gp OE 
MS 


{= jw 


(4.167) 


4.79. Let Y= aX + b. Show that if ‘¥;(@) is the characteristic function of X, then 


the characteristic function of Y is given by 
Vy (a= ely x (aw) 


By definition (4.66), 


Wy (@) —= E( ier ) =f [ eintax +b) ] 


= gidh Fy ( giaox ) — pivbys y (am) 


(4.168) 


4.80. Using the characteristic equation technique, redo part (b) of Prob. 4.18. 


Let Z=X + Y, where X and Y are independent. Then 


W (ay = Fle”) = Ble) = Fl e*) Ble” ) 


, ; (4.169) 
=F, (ay (a) 


Applying the convolution theorem of the Fourier transform (Appendix B), 
we obtain 


fx) =F "[Wz(o)] =F" [Vy )¥, )] 
= fy(2)*# fy@= f at Sy fp — x) dx 


The Laws of Large Numbers and the Central Limit Theorem 
4.81. Verify the weak law of large numbers (4.74); that is, 


lim P( 


X,—-u |>e)=0 for any € 
where Na =r 4 +---+X,)and E(X;) =U, Var(X;) = 3. 
n 


Using Eqs. (4.132) and (4.136), we have 


7 


FE(X,)—p and Vari — (4.170) 
Hi 


Then it follows from Chebyshev’s inequality [Eq. (2.116)] (Prob. 2.39) that 
P(X, -ul>o= 7, (4.171) 
Since lim, _, ,.67/(ne”) = 0, we get 
lim P(X, —u|>e)=0 
4.82. Let X be ar.v. with pdf f(x) and let _Xj, ..., X, be a set of independent r.v.’s 


each with pdf f(x). Then the set of r.v.’s Xj, ..., X, is called a random 
sample of size n of X. The sample mean 1s defined by 


4.83. 


_ I | 1 ” 
X, =-(X,+---X,)=-) Xx, (4.172) 


Let Xj, ..., X, be a random sample of X with mean p and variance o*. How 


many samples of X should be taken if the probability that the sample mean 
will not deviate from the true mean u by more than 0/10 is at least 0.95? 


Setting ¢ = 0/10 in Eq. (4.171), we have 


: no {100 7 n 


| 10 | : 10 


or 
= O 
Pix. =2|2——|21-— 
| i = <. 


Thus, if we want this probability to be at least 0.95, we must have 100/n < 
0.05 or n => 100/0.05 = 2000. 
Verify the central limit theorem (4.77). 


Let Xj, ..., X, be a sequence of independent, identically distributed r.v.’s 
with E(X,) = and Var(X;) = 0”. Consider the sum S,, =X, +... +_X,. Then 
by Eqs. (4.132) and (4.136), we have E(S,,) = nu and Var(S,,) = no”. Let 


1 a ae a q 
|) ( Mya 


2 


Jn | 
VAT Vii c= (F 
aa (4.173) 


Then by Eqs. (4.129) and (4.130), we have E(Z,) = 0 and Var(Z,,) = 1. Let 
M(t) be the moment generating function of the standardized r.v. Y; = (X; — p)/ 
o. Since E(Y;) = 0 and E(Y) = Var(Y,) = 1, by Eq. (4.58), we have 


M(0) = 1 M'(0) = E(¥)=0 M"(0) = E(¥2) =1 


Given that M’(t) and M"(f) are continuous functions of t, a Taylor (or 
Maclaurin) expansion of M(t) about t = 0 can be expressed as 


> > 


M(1)=M(0O)+M' (Oy +M "a =1+M"(t, = is 


By adding and subtracting ¢7/2, we have 


Mipy=1+ ad + = 1M '(t,)—Wer (4.174) 


Now, by Eqs. (4.157) and (4.160), the moment generating function of Z, is 


yr 


M, (t)= u(p (4.175) 
ni; 


Using Eqs. (4.174), Eq. (4.175) can be written as 


a7" 


t 


: | t ) 
——| +=[M@",)—1]|—— 


where now f, is between 0 and ;/\/, . Since M"(t) is continuous at f = 0 and 


Lao 
2 


Mz (t)= 


t; ~ 0asn — ©, we have 


lim [M"(t,)-1]=M"(0) -1=1-1=0 
no 


Thus, from elementary calculus, lim (1 + x/n)" = e*, and we obtain 


nN — 0 


hm M, (t)= lim { for a _ ne} 
n—o0 m nw 2n 2n 


The right-hand side is the moment generating function of the standard 
normal r.v. Z = N(0; 1) [Eq. (4.156)]. Hence, by Lemma 4.3 of the moment 
generating function, 


lim Z, = N(0;1) 


no 


4.84. Let_X), ..., X, be m independent Cauchy r.v.’s with identical pdf shown in 
Prob. 4.76. Let 


a =i, jieerieed xjo* > x 
n oe 
(a) Find the characteristic function of Y,. 
(6) Find the pdf of Y,. 


(c) Does the central limit theorem hold? 


(a) From Eq. (4.166), the characteristic function of X; is 
W, (@) = e-a|o| 


Let Y= X, +... + X,. Then the characteristic function of Y is 
i ; (iu) 2 Ele | ele itm — KX, i] is [|v Ps (wr ee nit in| (4] 76) 
i=l 
Now Y, = (1/n)Y. Thus, by Eq. (4.168), the characteristic function of Y,, 
is 


f \ 
r fi) - arn _ (9 ra 
w, (y= Wy| 2] =e ma win| gale (4.177) 
a \ Nn 


(b) Equation (4.177) indicates that Y, is also a Cauchy r.v. with parameter a, 
and its pdf is the same as that of X;. 


(c) Since the characteristic function of Y,, is independent of 7 and so is its 
pdf, Y,, does not tend to a normal r.v. as n — o. Random variables in the 


given sequence all have finite mean but infinite variance. The central 
limit theorem does hold but for infinite variances for which Cauchy 
distribution is the stable (or convergent) distribution. 


4.85. Let Y be a binomial r.v. with parameters (n, p). Using the central limit 
theorem, derive the approximation formula 
y—ap | 


¢__ Ses (4.178) 
pl ci P) | 


rv sya 


where (z) is the cdf of a standard normal r.v. [Eq. (2.73)]. 


We saw in Prob. 4.54 that if X), ..., X, are independent Bernoulli r.v.’s, each 
with parameter p, then Y=, + ... +_X, is a binomial r.v. with parameters (n, 
Pp). Since X;’s are independent, we can apply the central limit theorem to the 
r.v. Z, defined by 


Deeg lel X-p | 
ais. | | (4.179) 


dn S\ var(x,) | va S| Jpd=p) | 
Thus, for large n, Z,, is normally distributed and 
P(Z, =x) = P(x) (4.180) 


Substituting Eq. (4.179) into Eq. (4.180) gives 


<x|=P[Y <x J/npd— ‘citi 
{pata el rt srAeele 


ro sy)~o ee 


Vapdl — p) 
or 


Because we are approximating a discrete distribution by a continuous one, a 

slightly better approximation is given by 

f I ‘ 
y+ x — Tip 


cers l- 


PY =y)=@ (4.181) 


4.86. 


Formula (4.181) is referred to as a continuity correction of Eq. (4.178). 
Let Y be a Poisson r.v. with parameter 1. Using the central limit theorem, 
derive approximation formula: 


=. 
PLY Sy) * * 


(4.182) 


— 
VA; 


We saw in Prob. 4.71 that if X}, ..., X,, are independent Poisson r.v.’s X; 
having parameter A,, then Y= X, + ... +_X,, is also a Poisson r.v. with 
parameter A =A, +... +A,. Using this fact, we can view a Poisson r.v. Y with 
parameter A as a sum of independent Poisson r.v.’s_X;, i= 1, ...,n, each with 
parameter A /n; that is, 


Y=X,+--+X, 


E(X;)= “ =Var(X;) 


The central limit theorem then implies that the r.v. Z is defined by 


zat TAO) _ YA 


JVar(¥) VA (4.183) 
is approximately normal and 
P(Z = Z) = O(2) (4.184) 


Substituting Eq. (4.183) into Eq. (4.184) gives 


| — =:|=P¢ < JAz+ A)~ (2) 


VA 


or 


= 
PY < y)~o| 
(Y=y) 2) 


Again, using a continuity correction, a slightly better approximation is given 
by 


PY =y¥)=@ 


= (4.185) 


v 


SUPPLEMENTARY PROBLEMS 


4.87. Let Y= 2X + 3. Find the pdf of Yif_X is a uniform r.v. over (1, 2). 


4.88. Let X be ar.v. with pdf fx(x). Let Y = |X| > Find the pdf of Y in terms of 
Fx). 


4.89. Let Y= sin X, where X is uniformly distributed over (0, 27). Find the pdf of 
Y. 


4.90. Let X and Y be independent r.v.’s, each uniformly distributed over (0, 1). 
Let Z=X+ Y, W=X— Y. Find the marginal pdf’s of Z and W. 


4.91. Let X and Y be independent exponential r.v.’s with parameters o and B, 
respectively. Find the pdf of (a) Z = X — Y; (b) Z=.X/Y; (c) Z= max(X, Y); 
(d) Z= min(X, Y). 


A 


4.92. Let_X denote the number of heads obtained when three independent tossings 
of a fair coin are made. Let Y = X?. Find E(Y). 


4.93. Let X be a uniform r.v. over (—1, 1). Let Y= X”. 
(a) Calculate the covariance of X and Y. 


(b) Calculate the correlation coefficient of X and Y. 


4.94. What is the pmf of r.v. X whose probability generating function is 
Gy(z) = ? 


A 


4.95. Let Y=aX-+ b. Express the probability generating function of Y, G,(z), in 
terms of the probability generating function of X, Gy(z). 


2- 


4.96. Let the moment generating function of a discrete r.v. X be given by 
M,(t) = 0.25e' + 0.35e + 0.405 


Find P(X = 3). 


4.97. Let X be a geometric r.v. with parameter p. 


A 


(a) Determine the moment generating function of X. 
(b) Find the mean of X for p = = 

4.98. Let X be a uniform r.v. over (a, b). 
(a) Determine the moment generating function of X. 
(b) Using the result of (a), find E(X), E(X’), and E(X°). 


4.99. Consider a r.v. X with pdf 


Sx (x) = ~e gels ae en 
V320 


Find the moment generating function of X. 


4.100. Let X = M0; 1). Using the moment generating function of X, determine 
E(X"). 


4.101. Let_X and Y be independent binomial r.v.’s with parameters (n, p) and (m, 
P), respectively. Let Z =X + Y. What is the distribution of Z? 


4.102. Let (X, Y) be a continuous bivariate r.v. with joint pdf 


— [e7@ty x>0,y>0 
Sxy Y= 
otherwise 
(a) Find the joint moment generating function of X and Y. 


(5) Find the joint moments m9, m1, and m4). 


4.103. Let (X, Y) be a bivariate normal r.v. defined by Eq. (3.88). Find the joint 
moment generating function of X and Y. 


4.104. Let X;, ..., X,, be 1 independent r.v.’s and X; > 0. Let 


Show that for large n, the pdf of Y is approximately log-normal. 


4.105. Let y= (x — a)/V/A. where X is a Poisson r.v. with parameter 1. Show that Y 
~ N(0; 1) when A is sufficiently large. 


4.106. Consider an experiment of tossing a fair coin 1000 times. Find the 
probability of obtaining more that 520 heads (a) by using formula (4.178), 
and (b) by formula (4.181). 


4.107. The number of cars entering a parking lot is Poisson distributed with a rate 
of 100 cars per hour. Find the time required for more than 200 cars to have 
entered the parking lot with probability 0.90 (a) by using formula (4.182), 
and (b) by formula (4.185). 


ANSWERS TO SUPPLEMENTARY PROBLEMS 
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— l<y<7 
4.87. fr) =16 

Q otherwise 


: Fx (y) + fx(-y) y>0 
4.88. f(y) = 
fy) f ye0 


0 otherwise 


Z O21 w+tl -I<w<0 


4.90. fo(z)=)-z+2 1<2<2 fwOy=)-wt+l O<w<l 


l 
— -~l<y<1 
4.89. ty (y = i — y’ 


0 otherwise 0 otherwise 
4.91. 
if " 
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oR GG vgs ge oT 220 
(aj felzh ih) fiz) yar ph 
fee } 
ee zenf () <0 
F |p . 
oe Me Ba. A Boe ee he - tp 
fa) ital er ee a en z= {) 
\d zag 
we lice pyre: og 
id) ateh: ae 
Tele. Tr 
4.92. 3 
4.93. 
: . 1—odd eer) a—exld 
(a) Cov(X,F¥)—jal 2 (fi ee =1 Gees 
0 ff — e¥etl 0 l—even 


1 x+1 
4.94, Py (x) = (=) 


4.95. Gy (z) = 2G (2). 


4.96. 0.35 
pe’ 3 
4.97. (a) My(t)= = tS -hig,.g=I—p (b) E(X)==— 
l— ge 2 
4.98. . 
ye te 
(a):M ea) -———— 
f(b a) 


ib) E(X)— <(b +a). E(x? je —ab-—a*), F(X? - 20 +batba +07) 
4 Pa 


4.99. My(t) =e 7*82 


n=1,3,5,... 


4.100. E(X") = 
1-3----(n—-1) n=2,4,6,... 
4.101. Hint: Use the moment generating functions. 


Z is a binomial r.v. with parameters (n + m, p). 


; l ' 
4.102.(a) Myy(t,,t.) =—— (B) mo =Lamg, =1lem,=1 
) Myy Uh (@—1)0—h) 10 01 iW 


2 2, 2 Divi 
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4.104. Hint: Take the natural logarithm of Y and use the central limit theorem and 
the result of Prob. 4.10. 


4.105. Hint: Find the moment generating function of Y and let ) - «. 


4.106. (a) 0.1038 (b) 0.0974 


4.107. (a) 2.189 h (b) 2.1946 h 


CHAPTER 5 


Random Processes 


5.1 Introduction 


In this chapter, we introduce the concept of a random (or stochastic) process. 
The theory of random processes was first developed in connection with the 
study of fluctuations and noise in physical systems. A random process is the 
mathematical model of an empirical process whose development is governed 
by probability laws. Random processes provides useful models for the 
studies of such diverse fields as statistical physics, communication and 
control, time series analysis, population growth, and management sciences. 


5.2 Random Processes 


A. Definition: 


A random process is a family of r.v.’s {X(t), t € T} defined on a given 
probability space, indexed by the parameter ¢, where ¢ varies over an index 
set 7. 

Recall that a random variable is a function defined on the sample space S 
(Sec. 2.2). Thus, a random process {X(¢, ¢), ¢ € T, ¢ € S} is really a function 
of two arguments {X(¢, ¢), ¢€ 7, ¢ € S}. Fora fixed “(= t,), X(t, 2) = X,(Q) is 
ar.v. denoted by X(t,), as ¢ varies over the sample space S. On the other 
hand, for a fixed sample point ¢; € S, X(t, ¢;) = Xd is a single function of 
time t, called a sample function or a realization of the process. (See Fig. 5- 
1.) The totality of all sample functions is called an ensemble. 


Of course if both ¢ and ¢ are fixed, X(t, ¢;) is simply a real number. In the 
following we use the notation X(t) to represent X(t, ¢). 


Sample space 


Outcome 


Fig. 5-1 Random process. 


B. Description of Random Process: 


In arandom process {X(t), t € T}, the index set T is called the parameter set 
of the random process. The values assumed by_X(¢) are called states, and the 
set of all possible values forms the state space E of the random process. If 
the index set T of a random process 1s discrete, then the process is called a 
discrete-parameter (or discrete-time) process. A discrete-parameter process 
is also called a random sequence and is denoted by {X,,n = 1, 2, ...}. If Tis 
continuous, then we have a continuous-parameter (or continuous-time) 
process. If the state space E of a random process is discrete, then the process 
is called a discrete-state process, often referred to as a chain. In this case, the 
state space FE is often assumed to be {0, 1, 2, ...}. If the state space E is 
continuous, then we have a continuous-state process. 

A complex random process X(t) is defined by 


XD = XO + jX,0 


where X(t) and X,(7) are (real) random processes and j = \/—]. 


Throughout this book, all random processes are real random processes 
unless specified otherwise. 


5.3 Characterization of Random Processes 
A. Probabilistic Descriptions: 


Consider a random process X(t). For a fixed time ¢,, X(¢,) =X] is ar.v., and 
its cdf Fy(x); t,) 1s defined as 


FG, 0) = P (XG) = x,} (5.1) 


Fy(x1; t)) 1s known as the first-order distribution of X(t). Similarly, given ¢, 
and t, X(t;) = X, and X(t,) = X> represent two r.v.’s. Their joint distribution 
is known as the second-order distribution of X(t) and is given by 


Fb deed SPORE = We AE ee (3.2) 
In general, we define the nth-order distribution of X(t) by 
Fe slp. nasur tl ce Hy aansudegl ™ Gree eee wenam nay pe Me (5.3) 


If _X(2) is a discrete-state process, then X(f) is specified by a collection of 
pmf’s: 


ELT aoe Sc CORP Se a 9105 de ORO, Ale (3.4) 


If X(¢) is a continuous-time process, then X(Z) is specified by a collection of 
pdf’s: 


cag OEMs Tae livcon ty) 


eidiquwhe Lhe must? (aK) 
ie Cpeehieigeatg) Oe 


The complete characterization of X(t) requires knowledge of all the 
distributions as n — oo. Fortunately, often much less is sufficient. 


B. Mean, Correlation, and Covariance Functions: 


As in the case of r.v.’s, random processes are often described by using 
statistical averages. The mean of X(t) is defined by 


uy(t) = E[X(d] (5.6) 


where X(f) is treated as a random variable for a fixed value of ¢. In general, 
y(t) 1s a function of time, and it is often called the ensemble average of X(t). 
A measure of dependence among the r.v.’s of X(¢) is provided by its 
autocorrelation function, defined by 


R(t. 8) = E[X(X(5)] 27) 
Note that 

R,(@t, 8) = Rs. D (5.8) 
and 

Ry(t, tt) E[X*(n)| (5.9) 


The autocovariance function of X(t) is defined by 
F(t, 8) — Cov[X(z), AQ] — EqTAXU) — wn XGs) — Asy]] 
= Ry, 8) — syd) (5.10) 


It is clear that if the mean of X(f) 1s zero, then Ky(¢, s) = R(t, s). Note that 
the variance of X(t) is given by 


oy2() = Var XO] = FIX) a AOE} = Kyl a 


If X(t) is a complex random process, then its autocorrelation function 
R(t, s) and autocovariance function Kt, s) are defined, respectively, by 


R(t, 8) = EIXOX*)] (5.12) 


and 


Ky(t. 8) = EALX@) — wy (OILX(s) — By Cs) *} (5.13) 


where * denotes the complex conjugate. 


5.4 Classification of Random Processes 


If a random process X(t) possesses some special probabilistic structure, we 
can specify less to characterize X(t) completely. Some simple random 
processes are characterized completely by only the first-and second-order 
distributions. 


A. Stationary Processes: 


A random process {X(t), t € T} is said to be stationary or strict-sense 
stationary if, for all n and for every set of time instants (¢; € T,i= 1, 2, ..., 


n}, 
FW nme de irewt t= eepetame; Elmo he (a. 14) 
for any t. Hence, the distribution of a stationary process will be unaffected 


by a shift in the time origin, and X(t) and X(t + rt) will have the same 
distributions for any t. Thus, for the first-order distribution, 


Py. = Fat + t= Fy@) (3.15 
and 
fyad=f® (5.16) 
Then 
By) — E[XM)] — be (5.17) 


Var. X()] = &° (5.18) 


where and o? are constants. Similarly, for the second-order distribution, 


F (hy gs Bato) = Fy, das bs — 8) (5.19) 
and 
Jeli tei teed) Jae teeta he) (3.20) 


Nonstationary processes are characterized by distributions depending on 
the points ¢,, b, ..., t- 


B. Wide-Sense Stationary Processes: 


If stationary condition (5.14) of a random process X(t) does not hold for all 
but holds for n < k, then we say that the process X(f)is stationary to order k. 
If X(t) is stationary to order 2, then X(A) 1s said to be wide-sense stationary 
(WSS) or weak stationary. If X(t) is a WSS random process, then we have 


I. EUATS] = a Ceonstant) ($301 
2, Ras EX Ralls ff) (5.22) 


Note that a strict-sense stationary process is also a WSS process, but, in 
general, the converse is not true. 


C. Independent Processes: 


In arandom process X(t), if X(¢;) for i= 1, 2, ..., m are independent r.v.’s, so 
that for n = 2, 3, ..., 


Hn 

i — = Ye * PS Opa 

Fy (dpe tet hyg | | [eoea) (3.23) 
‘=I 


then we call X(t) an independent random process. Thus, a first-order 
distribution is sufficient to characterize an independent random process X(t). 


D. Processes with Stationary Independent Increments: 
A random process {X(f), t= 0)} is said to have independent increments if 
whenever 0<t, <t,<...<t,, 


X(0), X(t,) — X(0), X(t,) — X(t,). .... X(t.) — X(t,_,) 


are independent. If {X(£), t= 0)} has independent increments and_X(t) — X(s) 
has the same distribution as X(t + h) — X(s +h) for all s, t,h > 0, 5 <t, then 
the process X(f) is said to have stationary independent increments. 

Let {X(4), t= 0} be a random process with stationary independent 
increments and assume that _X(0) = 0. Then (Probs. 5.21 and 5.22) 


F[IX()] = 1 (5.24) 
where uw, = ELX(1)] and 
Var{X(t)] = 0," t (5.25) 


where o,7 = VarLX(1)]. 

From Eq. (5.24), we see that processes with stationary independent 
increments are nonstationary. Examples of processes with stationary 
independent increments are Poisson processes and Wiener processes, which 
are discussed in later sections. 


E. Markov Processes: 


A random process {X(t), t € 7} is said to be a Markov process if 


PING. Vet ACT Oa, MEP Bp] OPO PE ed a a 


whenever t; <t)<...<t, <tyy- 


A discrete-state Markov process is called a Markov chain. For a discrete- 
parameter Markov chain {X,,,n => 0} (see Sec. 5.5), we have for every n 


it 


pe er, Oe Pe a a | (9.27) 
| li ] I ny 1TH i 


Equation (5.26) or Eq. (5.27) is referred to as the Markov property (which is 
also known as the memoryless property). This property of a Markov process 
states that the future state of the process depends only on the present state 
and not on the past history. Clearly, any process with independent 
increments is a Markov process. 
Using the Markov property, the nth-order distribution of a Markov 
process X(t) can be expressed as (Prob. 5.25) 
Ue (NpaseenAge teaeentg) = felt] | PIM x,| X(t p]—%. 3 (524 


Thus, all finite-order distributions of a Markov process can be expressed in 
terms of the second-order distributions. 


F. Normal Processes: 


A random process {X(t), t € 7} is said to be a normal (or Gaussian) process 
if for any integer n and any subset {t,, ..., 4,} of 7, the n rv.’s X(t), ..., X(t,) 


are jointly normally distributed in the sense that their joint characteristic 
function is given by 


yt pevtigg (soy! Bvexp fll XC Ebay, XC IT 
exp h vo, FUX(t} — sy y eae, Conf X(t, X(t 3 | (5.2) 
r=! eet ee 


where @), ..., @, are any real numbers (see Probs. 5.59 and 5.60). Equation 


(5.29) shows that a normal process is completely characterized by the 
second-order distributions. Thus, if a normal process is wide-sense 
stationary, then it is also strictly stationary. 


G. Ergodic Processes: 


Consider a random process {X(t), — 0 <t< oo} with a typical sample 
function x(t). The time average of x(¢) is defined as 


Piet pur Lo eho babes 
(x(t) = fey, 7 f GD) eff (5.30) 


—"2 
Similarly, the time autocorrelation function Ry (7) of x(d) is defined as 


ro 


(y(t CMa bey b xOOatt 1) 
Rye) (ie) 119) = Lim Jpg MOM | a) af 


oo. 
Lr 
1 
tos 
a 
—= 
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A random process 1s said to be ergodic if it has the property that the time 
averages of sample functions of the process are equal to the corresponding 
statistical or ensemble averages. The subject of ergodicity is extremely 
complicated. However, in most physical applications, it is assumed that 
stationary processes are ergodic. 


5.5 Discrete-Parameter Markov Chains 


In this section we treat a discrete-parameter Markov chain {X,, n => 0} witha 
discrete state space E = {0, 1, 2, ... }, where this set may be finite or infinite. 
If X,, = i, then the Markov chain is said to be in state i at time 7 (or the nth 
step). A discrete-parameter Markov chain {X,, n = 0} is characterized by 
[Eq. (5.27)] 

PX 


Aol 


= {4 


ll 
me 
ll 


i dy = iy eX, =H = PX HX = (5.32) 


where P{x,,., =/|X,, =i} are known as one-step transition probabilities. If 
P{x,11 =J|X, = i} 1s independent of n, then the Markov chain is said to 


possess stationary transition probabilities and the process is referred to as a 
homogeneous Markov chain. Otherwise the process is known as a 
nonhomogeneous Markov chain. Note that the concepts of a Markov chain’s 
having stationary transition probabilities and being a stationary random 
process should not be confused. The Markov process, in general, is not 
stationary. We shall consider only homogeneous Markov chains in this 
section. 


A. Transition Probability Matrix: 


Let {X,,, 1 = 0} be a homogeneous Markov chain with a discrete infinite 
state space F = {0, 1,2, ... }. Then 


BSP =7|s, =k sees (5.33) 


regardless of the value of n. A transition probability matrix of {X,,, n= 0} is 
defined by 


Poo Poi Por 
P Py, iP 
P=[p,)= 10 «=P Pi 
Px =P Px 
where the elements satisfy 
p,; 20 s pet i=Q,.1 2,0 (5.34) 
i=0 


In the case where the state space F is finite and equal to {1, 2, ..., m}, Pism 
x m dimensional; that is, 


Pure Piz Pim 
P =[p;;1= va va ” van 
Pint Pm2 *** Pom 
where 
py =O Sp, =! i=1,2,...,m (5.35) 


A square matrix whose elements satisfy Eq. (5.34) or (5.35) is called a 
Markov matrix or stochastic matrix. 


B. Higher-Order Transition Probabilities—Chapman- 
Kolmogorov Equation: 


Tractability of Markov chain models is based on the fact that the probability 
distribution of {X,,, 1 = 0} can be computed by matrix manipulations. 


Let P = [p;;] be the transition probability matrix of a Markov chain {X,, 1 
> 0}. Matrix powers of P are defined by 


Pp? = PP 


with the (7, /)th element given by 


Note that when the state space E is infinite, the series above converges, since 
by Eq. (5.34), 


S Pit Pej = S Pit =] 
k k 


Similarly, P? = PP? has the (i, /)th element 


é) (2) 
Pj = S Pit Pej 
k 


and in general, P’*! = PP” has the (i, j)th element 
cant ii) 
D;: _ ¥ Pix Prj (5.36) 
kK 


Finally, we define P° = J, where J is the identity matrix. 


The n-step transition probabilities for the homogeneous Markov chain 
{X,,, 1 = 0} are defined by 


PX, = 1|Xo = D 


Then we can show that (Prob. 5.90) 


py = PFs 


) (5.37) 


We compute pi” by taking matrix powers. 


The matrix identity 
perm = prpm n,m = 0 
when written in terms of elements 


(rtm) (72) 772) f 
Pi; a > Pa Puy (5.38) 
i 


is known as the Chapman-Kolmogorov equation. It expresses the fact that a 
transition from i to / inn +m steps can be achieved by moving from i to an 
intermediate k in n steps (with probability p,“, and then proceeding to j 
from & in m steps (with probability py”). Furthermore, the events “go from 
ito k inn steps” and “go from k to 7 in m steps” are independent. Hence, the 
probability of the transition from 7 to 7 inn + m steps via i, k, j is Pa py. 
Finally, the probability of the transition from i to j is obtained by summing 
over the intermediate state k. 


C. The Probability Distribution of {X,,, 1 = 0}: 
Let p(n) = P(X, = i) and 
pin) = [ppd pm) psa] 
where Zp in) = | 
Then p,0) = P(Xp = i) are the initial-state probabilities, 


PO) = [p,O) p,O) pO) ...] 


is called the initial-state probability vector, and p(n) is called the state 
probability vector after n transitions or the probability distribution of X,,. 


Now it can be shown that (Prob. 5.29) 
pia) = p(Q)P” (5.39) 


which indicates that the probability distribution of a homogeneous Markov 
chain is completely determined by the one-step transition probability matrix 
P and the initial-state probability vector p(0). 


D. Classification of States: 


1. Accessible States: 
State 7 is said to be accessible from state i if for some n > 0, pe” > 0, and we 


write i — 7. Two states i and j accessible to each other are said to 
communicate, and we write i < /. If all states communicate with each other, 
then we say that the Markov chain 1s irreducible. 


2. Recurrent States: 

Let T; be the time (or the number of steps) of the first visit to state j after 
time zero, unless state 7 is never visited, in which case we set T=. Then T, 
is a discrete r.v. taking values in {1, 2, ..., 00}. 


Let 


a SP Sa PEN A Pe Lt Le Se (34h 
way ! I} a J i cl 


and FO = 0 since 7;> 1. Then 


ik : - PE, mai me rae a IX, —i}- pi (41) 
and 
1 i I 4 = ie —lh oo 
fee s Pie fe" ta rears (5.42) 
hey 


The probability of visiting 7 in finite time, starting from /, is given by 
ii = > Fi iy — PT; <_% [Xp = i) (5.43 ) 
=) 


Now state 7 is said to be recurrent if 


PHrteax Hp (5.44) 


That is, starting from /, the probability of eventual return to 7 is one. A 
recurrent state 7 is said to be positive recurrent if 


E(T,|X, =j)< 2m (5.45) 
and state j is said to be null recurrent if 
E(T,|X, = y= 6 (5.46) 


Note that 


we 


EC;|Xo= = > afi" (5.47) 


n=O 


3. Transient States: 


State j 1s said to be transient (or nonrecurrent) if 

ae ee eee ee sige 
LR re Se|x, Hes (5.48) 
In this case there is positive probability of never returning to state /. 


4. Periodic and Aperiodic States: 
We define the period of state 7 to be 


d(j) = god{n = 1: p, > 0} 


where gcd stands for greatest common divisor. 
If dj) > 1, then state 7 is called periodic with period d(j). If d(7) = 1, then 
state j is called aperiodic. Note that whenever p,; > 0, 7 is aperiodic. 


5. Absorbing States: 
State 7 is said to be an absorbing state if p,; = 1; that is, once state 7 is 
reached, it is never left. 


E. Absorption Probabilities: 


Consider a Markov chain X(n) = {X,,, n = 0} with finite state space EF = {1, 2, 


., N} and transition probability matrix P. Let A = {1, ..., m} be the set of 
absorbing states and B = {m+ 1, ..., N} be a set of nonabsorbing states. 
Then the transition probability matrix P can be expressed as 


| Wes #8 ( sim 
a | ( i o | 
F | 
ily g 
p-| 9 | 0 o |= (5.45; 
R a | A) 


Pritt Sa Pa Vn Pas liad 0" Pay Ls a 


7 a 2 0 i | | 
where / is an m X m identity matrix, 0 1s an m x (N— m) zero matrix, and 


Pmt eee Balint Ba 
R=| : ut {)= F ut (3.49b) 


Pup oo Palm | Pyo-l Pas 


Note that the elements of R are the one-step transition probabilities from 
nonabsorbing to absorbing states, and the elements of Q are the one-step 
transition probabilities among the nonabsorbing states. 

Let U=[u,;], where 


Ws = P{X, = j(E A) |X, = KE B)} 


It is seen that U is an (N — m) X m matrix and its elements are the absorption 
probabilities for the various absorbing states. Then it can be shown that 
(Prob. 5.40) 


U=(U-— Oy'R= OR (5.50) 


The matrix ® = (J— Q)! is known as the fundamental matrix of the Markov 
chain X(n). Let 7; denote the total time units (or steps) to absorption from 


state k. Let 


a hd Tint? Ty 
Then it can be shown that (Prob. 5.74) 
ET )~ Yo gy kom+l,...N (S51) 


i =at+ | 


where @,, is the (k, i)th element of the fundamental matrix ®. 


F. Stationary Distributions: 


Let P be the transition probability matrix of a homogeneous Markov chain 
{X,, 1 = 0}. If there exists a probability vector p such that 


pP —p (5.52) 


then p is called a stationary distribution for the Markov chain. Equation 
(5.52) indicates that a stationary distribution p is a (left) eigenvector of P 
with eigenvalue 1. Note that any nonzero multiple of p is also an 
eigenvector of P. But the stationary distribution p is fixed by being a 
probability vector; that is, its components sum to unity. 


G. Limiting Distributions: 


A Markov chain is called regular if there is a finite positive integer m such 
that after m time-steps, every state has a nonzero chance of being occupied, 
no matter what the initial state. Let A > O denote that every element a, of A 
satisfies the condition a, > 0. Then, for a regular Markov chain with 
transition probability matrix P, there exists an m > 0 such that P” > O. Fora 
regular homogeneous Markov chain we have the following theorem: 


THEOREM 5.5.1 
Let {X,,, 1 = 0} be a regular homogeneous finite-state Markov chain with 
transition matrix P. Then 


lim P? =P (5.53) 


nx 


where P is a matrix whose rows are identical and equal to the stationary 
distribution p for the Markov chain defined by Eq. (5.52). 


5.6 Poisson Processes 


A. Definitions: 


Let ¢ represent a time variable. Suppose an experiment begins at ¢ = 0. 
Events of a particular kind occur randomly, the first at 7), the second at T), 


and so on. The r.v. 7; denotes the time at which the ith event occurs, and the 
values ¢; of T; (i= 1, 2, ...) are called points of occurrence (Fig. 5-2). 


}e— 2; —#t- — 2, ——+}_-2, l#+-—— 2, ——-# 


Fig. 5-2 


Let 
7_=T -T (5.54) 


and 7) = 0. Then Z,, denotes the time between the (n — 1)st and the nth 
events (Fig. 5-2). The sequence of ordered r.v.’s {Z,,, 1 = 1} 1s sometimes 
called an interarrival process. If all r.v.’s Z, are independent and identically 
distributed, then {Z,, n = 1} is called a renewal process or a recurrent 
process. From Eq. (5.54), we see that 


| i ta oe ae ean ae A 

n 2 n 
where 7, denotes the time from the beginning until the occurrence of the nth 
event. Thus, {7',, 1 = 0} is sometimes called an arrival process. 


B. Counting Processes: 


A random process {X(f), t= 0} is said to be a counting process if X(t) 
represents the total number of “events” that have occurred in the interval (0, 
t). From its definition, we see that for a counting process, X(t) must satisfy 
the following conditions: 


1. X(t) = 0 and X(0) = 0. 

2. X(t) is integer valued. 

3. X(s) < X(4) ifs <t. 

4. X(t) — X(s) equals the number of events that have occurred on the 
interval (s, f). 


A typical sample function (or realization) of X(t) is shown in Fig. 5-3. 

A counting process X(f) is said to possess independent increments if the 
numbers of events which occur in disjoint time intervals are independent. A 
counting process X(f) is said to possess stationary increments if the number 
of events in the interval (s + h, t + h)—that is, X(t + h) — _X(s + h)—has the 
same distribution as the number of events in the interval (s, f)}—that is, X(t) — 
X(s)—for all s << t and h> 0. 


xt 


Fig. 5-3 A sample function of a counting process. 


C. Poisson Processes: 
One of the most important types of counting processes is the Poisson 
process (or Poisson counting process), which is defined as follows: 


DEFINITION 5.6.1 


A counting process X(f) is said to be a Poisson process with rate (or 
intensity) A(> 0) if 


1. X(0) = 0. 
2. X(t) has independent increments. 


3. The number of events in any interval of length ¢ is Poisson distributed 
with mean J¢; that is, for all s, t> 0, 


PLX(t + s)— Xt) =n] =e ° HTD) a6 (5.55) 


It follows from condition 3 of Def. 5.6.1 that a Poisson process has 
stationary increments and that 
E[X()] = At (5.56) 
Then by Eq. (2.51) (Sec. 2.7E), we have 
Var[X(t)] = At (5.57) 


Thus, the expected number of events in the unit interval (0, 1), or any other 
interval of unit length, is just 2 (hence the name of the rate or intensity). 
An alternative definition of a Poisson process is given as follows: 


DEFINITION 5.6.2 
A counting process X(f) 1s said to be a Poisson process with rate (or 
intensity) A(> 0) if 


1. X(0) =0. 


2. X(t) has independent and stationary increments. 
3. PLX(t+ At) — X(t) = 1] =4 At + o(Ad) 
4. P[X(t + Ad) — X(f) = 2] = o(Ad) 


where o(Af) is a function of)t which goes to zero faster than does Az; that is, 


a(At) 
7 


Ar 20) AT 


aon (5.58) 


Note: Since addition or multiplication by a scalar does not change the 
property of approaching zero, even when divided by At, o((At) 
satisfies useful identities such as o(A?t) + o(At) = o(At) and ao(At) = 
o(At) for all constant a. 

It can be shown that Def. 5.6.1 and Def. 5.6.2 are equivalent (Prob. 5.49). 

Note that from conditions 3 and 4 of Def. 5.6.2, we have (Prob. 5.50) 


PLXG | Ag XG) =O] = 1 AA eft) (5.59) 


Equation (5.59) states that the probability that no event occurs in any short 
interval approaches unity as the duration of the interval approaches zero. It 
can be shown that in the Poisson process, the intervals between successive 
events are independent and identically distributed exponential r.v.’s (Prob. 
5.53). Thus, we also identify the Poisson process as a renewal process with 
exponentially distributed intervals. 

The autocorrelation function Ry(¢, s) and the autocovariance function 


Ky(t, s) of a Poisson process X(t) with rate A are given by (Prob. 5.52) 
R(t, 8) — A min(t, 8) — Ars (5.60) 


Ky(f, $s) = A min(f, §) (5.61) 


5.7 Wiener Processes 


Another example of random processes with independent stationary 
increments is a Wiener process. 


DEFINITION 5.7.1 
A random process {X(f), t= 0} is called a Wiener process if 


1. X(d) has stationary independent increments. 

2. The increment X(t) — X(s)(¢> 5) is normally distributed. 
3. ELX()] = 0. 

4. X(0) =0. 


The Wiener process is also known as the Brownian motion process, since it 
originates as a model for Brownian motion, the motion of particles 
suspended in a fluid. From Def. 5.7.1, we can verify that a Wiener process is 
a normal process (Prob. 5.61) and 


E[X(p)| = 0 (3,62) 
Var|X(f)| = o7f (5.63) 


where o” is a parameter of the Wiener process which must be determined 
from observations. When o* = 1, X(t) is called a standard Wiener (or 
standard Brownian motion) process. 

The autocorrelation function R,(¢, s) and the autocovariance function 


Ky(t, s) of a Wiener process X(t) are given by (see Prob. 5.23) 
Ry lt. 9) = Ky 6) = oF mint, 8) 4220 (5.64) 


DEFINITION 5.7.2 
A random process {X(t), t= 0} is called a Wiener process with drift 
coefficient \ if 


1. X(d) has stationary independent increments. 
2. X(t) is normally distributed with mean pt. 
3. X(0) =0. 


From condition 2, the pdf of a standard Wiener process with drift coefficient 
1s given by 


- I ~(x—ury 422) 
ae aa err 5,65) 
J 2st 909) 


5.8 Martingales 


Martingales have their roots in gaming theory. A martingale is a random 
process that models a fair game. It is a powerful tool with many applications, 
especially in the field of mathematical finance. 


A. Conditional Expectation and Filtrations: 


The conditional expectation E(Y|X), ..., X,,) is ar.v. (see Sec. 4.5 D) 
characterized by two properties: 


1. The value of E(Y|X}, ..., X,,) depends only on the values of X), ..., X,,, 


that is, 
EY |X e009 Xo = RK paces XD (5.66) 
2D 

FEY | Xie ONS BR) (5.67) 


If X,, ..., X,, 1s a sequence of r.v.’s, we will use F’,, to denote the information 
contained in .X), ..., X,, and we write E(Y|F,,) for E(YX), ..., X,,), that is, 


E(P |X\ps9X%) = AY 


F ) (5.68) 


We also define information carried by r.v.’s Xj, ..., X,, in terms of the 
associated event space (o-field), o(X), ..., X,,). Thus, 


FS AE case BD (5.69) 


fl 


and we say that F’,, is an event space generated by x), ..., X,,. We have 


sh ee if l=H=m (3.70) 


A collection {F,,,n = 1, 2, ...} satisfying Eq. (5.70) is called a filtration. 


Note that if a rv. Z can be written as a function of X), 


measurable with respect to X), ..., X,,, or F,,-measurable. 


Properties of Conditional Expectations: 
1. Linearity: 
Alay, + 0 Y.| Fo) SARE FO) + ob Fi GIF) 

where a and 4 are constants. 
2. Positivity: 
If Y> 0, then 

EY |F) =.0 
3. Measurablity: 
If Y is F,,-measurable, then 

F(Y |F’) = V7 
4. Stability: 
If Z is F,-measurable, then 

E(YZ.|F ) = ZY |F) 


5. Independence Law: 
If Y is independent of F,,, then 


EY |F.) = EY) 


6. Tower Property: 


...,X,, It is called 


(5.71) 


(5.73) 


(5.74) 


(5.75) 


FEY |F)|F_ I= EY |F,) ifmsn (5.76) 


7. Projection Law: 


E\EW 


Fol = £E(¥) (5.77) 


8. Jensen’s Inequality: 
If g is a convex function and F(|Y|) < 0%, then 


E(e(¥)|F,) = gE |F,) (5.78) 


B. Martingale in Discrete Time: 


Definition: 
A discrete-time random process {M,, n = 0} is a martingale with respect to 
jie i 


(ly F¢|M [jt for alla = fi 
(2) EAM Pee, ler all # (570% 


It immediately follows from Eq. (5.79) that for a martingale 
AM Fee fra (SRL) 


A discrete-time random process {M,,, n = 0} is a submartingale 
(supermartingale) with respect to F,, if 


Ul) EA [re furall a = 4 
(2) FIM Jee (yM, far all n (3.41) 


While a martingale models a fair game, the submartingale and 
supermartingale model favorable and unfavorable games, respectively. 


Theorem 5.8.1 
Let {M,, n= 0} be a martingale. Then for any given n 


FM) = F(M, 4) =... = FM, (5.82) 


Equation (5.82) indicates that in a martingale all the r.v.’s have the same 
expectation (Prob. 5.67). 


Theorem 5.8.2 (Doob decomposition) 
Let X= {X,,,n = 0} be a submartingale with respect to F,. Then there exists 
a martingale MV = {M,, n= 0} and a process A = {A4,, n = 0} such that 


(1) M is a martingale with respect to F,,; 
(2) A is an increasing process A,,, = 4,; 
(3) A, is F,, ;-measurable for all n; 

(4) X,=M,+A4,. 


(For the proof of this theorem see Prob. 5.78.) 


C. Stopping Time and the Optional Stopping Theorem: 


Definition: 
Atv. Tis called a stopping time with respect to F, if 


1. 7 takes values from the set {0, 1, 2, ..., 0} 
2. The event {7 =n} is F,,-measurable. 


EXAMPLE 5.1: A gambler has $100 and plays the slot machine at $1 per 
play. 


1. The gambler stops playing when his capital is depleted. The number 7 
=n, of plays that it takes the gambler to stop play is a stopping time. 


2. The gambler stops playing when his capital reaches $200. The number 
T =n, of plays that it takes the gambler to stop play is a stopping time. 


3. The gambler stops playing when his capital reaches $200, or is 
depleted, whichever comes first. The number 7 = min(7,, n>) of plays 


that it takes the gambler to stop play is a stopping time. 


EXAMPLE 5.2 A typical example of the event 7’ is not a stopping time; it is 
the moment the stock price attains its maximum over a certain period. To 
determine whether T is a point of maximum, we have to know the future 
values of the stock price and event {T=n} € F... 


Lemma 5.8.1 


1. If 7, and 7, are stopping times, then so is 7, + T). 
2. If T, and T, are stopping times, then T= min(7,, n) and T= max(n,, 
Ny) are also stopping times. 


3. min (7; n) is a stopping time for any fixed n. 


Let 1, denote the indicator function of A, that is, the r.v. which equals | if 
A occurs and 0 otherwise. Note that /;7,,,, the indicator function of the event 
{T > n}, is F,,-measurable (since we need only the information up through 
time n to determine if we have stopped by time 7). 


Optional Stopping Theorem: 
Suppose {M,, n= 0} is a martingale and T is a stopping time. If 


li AP (5.83 
Cy RM, |p 5.415 
3) Ime (Myron ] 5.25) 
Then 

F(M,) = E(M,) (5.86) 


Note that Eqs. (5.84) and (5.85) are always satisfied if the martingale is 
bounded and P(T < «) = 1. 


D. Martingale in Continuous Time 


A continuous-time filtration is a family {F,, t= 0} contained in the event 
space F' such that F, C F;, for s <t. 


The continuous random process X(f) is a martingale with respect to F, if 


i ext 


jae (5.87) 
(TAAL =X (y) for ee a [Sac | 


Similarly, continuous-time submartingales and supermartingales can be 
defined by replacing equal (=) sign by > and <, respectively, in Eq. (5.88). 


SOLVED PROBLEMS 


Random Processes 


5.1. Let X), X>, ... be independent Bernoulli r.v.’s (Sec. 2.7A) with P(X, = 
1) =p and P(X, = 0) =q=1-p for all n. The collection of r.v.’s {X,, 1 
> 1} is arandom process, and it is called a Bernoulli process. 

(a) Describe the Bernoulli process. 
(6) Construct a typical sample sequence of the Bernoulli process. 


(a) The Bernoulli process {X,, n = 1} is a discrete-parameter, 
discrete-state process. The state space is E' = {0, 1}, and the index 
setis T= {1, 2, ...}. 

(6) A sample sequence of the Bernoulli process can be obtained by 
tossing a coin consecutively. If a head appears, we assign 1, and if 
a tail appears, we assign 0. Thus, for instance, 

it | 2 f 4 A a 7 i 4 I) 
Coin tossing H T iT H H FH T H H T 
i, ro oo @ tT or ot ¢ | 4) 


The sample sequence {x,,} obtained above is plotted in Fig. 5-4. 


0 2 4 6 8 10 n 


Fig. 5-4 A sample function of a Bernoulli process. 


5.2. Let Z,, Z), ... be independent identically distributed r.v.’s with P(Z,, = 
1)=pand P(Z, =- 1)=q=1-p forall n. Let 
x= 2 Z, nz=1,2,... (5,89) 


il 


and Xo = 0. The collection of r.v.’s {X,, 1 = 0} is a random process, 
and it is called the simple random walk X(n) in one dimension. 

(a) Describe the simple random walk X(n). 

(6) Construct a typical sample sequence (or realization) of X(7). 


(a) The simple random walk X(n) is a discrete-parameter (or time), 
discrete-state random process. The state space is F = {..., —2, -1, 
0, 1, 2, ...}, and the index parameter set is T= {0, 1, 2, ...}. 

(b) A sample sequence x(n) of a simple random walk X(n) can be 
produced by tossing a coin every second and letting x(m) increase 
by unity if a head appears and decrease by unity if a tail appears. 
Thus, for instance, 


tt fh | 
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The sample sequence x(n) obtained above is plotted in Fig. 5-5. 
The simple random walk X(n) specified in this problem is said to 
be unrestricted because there are no bounds on the possible values 
of X,,. 


The simple random walk process is often used in the following 
primitive gambling model: Toss a coin. If a head appears, you win 
one dollar; if a tail appears, you lose one dollar (see Prob. 5.38). 


5.3. Let {X,,1= 0} bea simple random walk of Prob. 5.2. Now let the 
random process X(t) be defined by 


X) = X, n=f< at 1 


(a) Describe X(Z). 
(6) Construct a typical sample function of X(¢). 


(a) The random process X(f) is a continuous-parameter (or time), 
discrete-state random process. The state space is F = {..., —2, -1, 
0, 1,2, ...}, and the index parameter set is T= {t, t> 0}. 

(6) A sample function x(t) of X(t) corresponding to Fig. 5-5 is shown 
in Fig. 5-6. 


x(N) 


Fig. 5-5 A sample function of a random walk. 


Fig. 5-6 


5.4. Consider a random process X(f) defined by 
X(t) = Y cos ot ee | 


where @ is aconstant and Y is a uniform r.v. over (0, 1). 
(a) Describe X(t). 
(b) Sketch a few typical sample functions of X(7). 


(a) The random process X(f) is a continuous-parameter (or time), 
continuous-state random process. The state space is E = {x: -1 <x 
<1} and the index parameter set is T= {t: t> 0}. 


(b) Three sample functions of X(t) are sketched in Fig. 5-7. 


X(t) 
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Fig. 5-7 


5.5. Consider patients coming to a doctor’s office at random points in time. 
Let X,, denote the time (in hours) that the nth patient has to wait in the 


office before being admitted to see the doctor. 
(a) Describe the random process X(n) = {X,,, n= 1}. 
(6) Construct a typical sample function of X(n). 


(a) The random process X(n) is a discrete-parameter, continuous-state 
random process. The state space is E = {x: x > 0), and the index 


parameter set is T= {1, 2, ...}. 
(6) A sample function x(n) of X(n) 1s shown in Fig. 5-8. 


Characterization of Random Processes 


5.6. Consider the Bernoulli process of Prob. 5.1. Determine the probability 
of occurrence of the sample sequence obtained in part (5) of Prob. 5.1. 


Since X,,’s are independent, we have 
PN =x, x, mati x = x | = P(X, = x) POX, = 2), PUR = x (5.90) 


Thus, for the sample sequence of Fig. 5-4, 


PLA, | Wy mah x, a ii x re x. ea if x read A. ~ HS A, =A igh a | hey ae | aa aad 
x(n) 
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Fig. 5-8 


5.7. Consider the random process X(t) of Prob. 5.4. Determine the pdf’s of 
X(t) at t= 0, 2/4a@,2/20,7/0. 


5.8. 


For t= 0, X(0) = Y cos 0 = Y. Thus, 


Ll 0<z<] 
0 otherwise 


Fx) | 


For; = z/4a, X(a/4w) = Yeos 2/4 = 1/2 Y. Thus, 


J2 0<x<1/V2 


0 otherwise 


Sxiaiao) (x) = 


For t= 2/2, X(a/2@) = Y cos a/2@ = 0; that is, X(z/2@) = 0 
irrespective of the value of Y. Thus, the pmf of X(@/2@) is 


F 


X(x/2a%) = P(X = 0) = 1 


For t= 2/a@, X(a/w) = Y cos a = — Y. Thus, 


1 —-1<x<0 


fx(aiw)) = | 


0 otherwise 


Derive the first-order probability distribution of the simple random 
walk X(n) of Prob. 5.2. 


The first-order probability distribution of the stmple random walk _X(n) 
is given by 


p,(k) = PX, =k) 


where & is an integer. Note that P(Xo = 0) = 1. We note that p,(k) = 0 if 
n <|k| because the simple random walk cannot get to level x in less 
than |k| steps. Thus, 1 > |kj. 

Let N,,* and N,,_ be the r.v.’s denoting the numbers of +1s and —1s, 
respectively, in the first 1 steps. 


Then 
n=N+—N (5.91) 


a ol, (5.92) 
Adding Eqs. (5.91) and (5.92), we get 


> + | 7 : aw 
N,, = s(n +X) (5.93) 
Thus, X,, = kif and only if N,* = 4 (n + k). From Eq. (5.93), we note 
that 2N,," =n + X,, must be even. Thus, X,, must be even if 7 is even, 
and X,, must be odd if 1 is odd. We note that N,,* is a binomial r.v. with 
parameters (n, p). Thus, by Eq. (2.36), we obtain 

‘ 


p, tk | g=l-—p (5.94) 


: \ 
Fi | (ata feb 
! 


(n+ k)/2 


where n > |k|, and n and & are either both even or both odd. 


5.9. Consider the simple random walk X(n) of Prob. 5.2. 
(a) Find the probability that X(n) = —2 after four steps. 


(b) Verify the result of part (a) by enumerating all possible sample 
sequences that lead to the value X(n) = —2 after four steps. 


(a) Setting A =—2 and n = 4 in Eq. (5.94), we obtain 


4 : 
rex, ==2)= ral-2)=| 4) pg? = 4p? q=\|—p 


/ 
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Fig. 5-9 


(6) All possible sample functions that lead to the value X, = —2 after 
four steps are shown in Fig. 5-9. For each sample sequence, P(X, 


= —2) = pq’. There are only four sample functions that lead to the 
value XY, = —2 after four steps. Thus, P(X, = —2) = 4pq°. 


5.10. Find the mean and variance of the simple random walk X(n) of Prob. 
a2. 


From Eq. (5.89), we have 


Aa ae n—1.2,... (5.95) 


n Al 


and Xp = 0 and Z, (n = 1, 2, ...) are independent and identically 
distributed (11d) r.v.’s with 


PY Srl) =p PZ = lpg T —p 


From Eq. (5.95), we observe that 


Xp = Xq t 4, = 4 
ae ; ; (5.96) 
X,=Z,+Z,++-+Z, 
Then, because the Z,, are 1d r.v.’s and Xp = 0, by Eqs. (4.132) and 
(4.136), we have 


E(X,)=E 27 eae 
Var(X,,) = Var » Z, | =n Var(Z,) 
k=1 

Now 
E\Z, ) =(lyp ( Ilhg=p q (5.97) 
£(Z.7) —¥p —(-1P¥g—pt+q-1 (5.98) 

Thus, 
Var(Z,) — E(Z,7) — [EZ - 1 — -— a —4pe (5.99) 

Hence, 
E(X p= np g) g=l1 p (5.100) 
Var (X_.) = 4npg g=I-—p (5.101) 
Note that if p = g = > then 

E(X,) =0 (5.102) 


Var(X) =n (5.103) 


5.11. Find the autocorrelation function R,(n, m) of the simple random walk 


X(n) of Prob. 5.2. 


From Eq. (5.96), we can express X,, as 


where Zp) = Xp = 0 and Z; (i = 1) are iid r.v.’s with 


PZ,=+l)=p PZ, =- 


By Eq. (5.7), 


R (a, m) = E[X(n)X(m)] = 


Then by Eq. (5.104), 


ming, m1 


Ry (nan= ) yz, Fie > bz eS SEZ EZ, ) 


=i k= isd k=" 
aL 


rr 


Using Eqs. (5.97) and (5.98), we obtain 


* F *: + F; = ike 7 
Raat, ey — minta, ey + Laat — mine, alin — @) 


or 


m (an mip gy mtn 
Ry(aviny= A 
n+ (am — ak p—|qy RM 


Note that if p = g = . then 


Ron.) = mina, et} nim =) 


5.12. Consider the random process X(t) of Prob. 5.4; that is, 


i=-¢g=L--f 


(5.104) 


(3.1053 


(5.106) 


(5.107) 


(5,108) 


X(t) = Y cos ot r= 0 


where w is aconstant and Y is a uniform r.v. over (0, 1). 
(a) Find ELX(d)]. 
(b) Find the autocorrelation function R,(¢, s) of X(0). 


(c) Find the autocovariance function K,(¢, s) of X(¢). 
(a) From Eqs. (2.58) and (2.110), we have E(Y) = 5 and E(Y?) = +. 
Thus, 


| 


ms 
= 


BLA) = ACK cos ov) = BC) cos ot = 2 cos on (a.10%) 


(6b) By Eq. (5.7), we have 
Ayla e= A] ACOA (| = RLY? coseul cos thn} 


. | (5,110) 
=r) cos wrt cos a = COS OT O08 ws 


(c) By Eq. (5.10), we have 
K,@,4)= Ry (t. 9) — EDX ET X68) 


| L 
aes COS mr cos ms — OS OF OOS Woe ' 
; 4 (5.111) 
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5.13. Consider a discrete-parameter random process X(n) = {X,, n= 1} 
where the X,,’s are tid r.v.’s with common cdf F(x), mean yu, and 
variance o7. 
(a) Find the joint cdf of X(n). 
(b) Find the mean of X(n). 
(c) Find the autocorrelation function R,(n, m) of X(n). 


(d) Find the autocovariance function K,(n, m) of X(n). 


(a) Since the_X,,’s are iid r.v.’s with common cdf F(x), the joint cdf of 


X(n) is given by 


Fy (x), .000%y) -[ Jaa —[Fy@yt’ (5.112) 
r= 
(6) The mean of X(n) is 
(a) — E(X) — for all rr (5,113) 
(c) Ifn#m, by Eqs. (5.7) and (5.113), 
R (an, m) = E(X_X) = E(X )E(X) = 
If n = m, then by Eq. (2.31), 
FIX? )—VaedX i) ROX <r ye 
Hence, le ce 
Rainey (3.114) 
la — =m 
(d) By Eq. (5.10), 
[’ fei sete 
r (5.1153 


Ky(amy— Ry (at, mM) yd ey Ge) : 
|e R= 


Classification of Random Processes 


5.14. Show that a random process which is stationary to order 7 is also 
stationary to all orders lower than n. 


Assume that Eq. (5.14) holds for some particular ; that is, 


PEN Xp erey MED SESS PIG PS hoo AG FO Sa) 


for any t. Letting x, — ©, we have [see Eq. (3.63)] 


PLAGE) ~ Xj 5000, MCE) * Hag} ~ PEA, 8) Faye AE ii 7 aaa 


5.15. 


5.16. 


l A a-] a—l/ 


and the process is stationary to order n — 1. Continuing the same 


procedure, we see that the process is stationary to all orders lower than 
n. 


Show that if (X(t), ¢ € T} is a strict-sense stationary random process, 
then it is also WSS. 


Since X(f) is strict-sense stationary, the first- and second-order 


distributions are invariant through time translation for all z € 7. Then 
we have 


ud) = ELX@]) = EX + D] = Wt + 0) 
And, hence, the mean function ,(¢) must be constant; that is, 
E[X(0)] = uw (constant) 
Similarly, we have 
E[X(s)X(0)| = E[LX(s + DX(t + 1] 


so that the autocorrelation function would depend on the time points s 
and t only through the difference |t — s|. Thus, X(t) is WSS. 


Let {X,,, 1 = 0} be a sequence of tid r.v.’s with mean 0 and variance 1. 
Show that {X,,, 1 = 0} is a WSS process. 


By Eq. (5.113), 
E(X_,,) = 0 (constant) for alln 


and by Eq. (5.114), 


E(X, )E(X,,,)=0  k#0 
Ry(n,n+k)= E(X,Xqa4) = : 
E(X,?)=Var(X,)=1 k=0 


which depends only on k. Thus, {X,,} is a WSS process. 


5.17. Show that if a random process X(f) is WSS, then it must also be 
covariance stationary. 


If X(t) is WSS, then 
HVAC] — be (oonstane) fow all i 
Rit, t+ oy] = R(t) for all? 
Nowy Ath | t= Cov[ALpale | a) = Ay. oo) ETALRTAG 1 oh] 
=Kylty = ie 


which indicates that K,(t, t+ t) depends only on r; thus, X(¢) is 
covariance stationary. 


5.18. Consider a random process X(f) defined by 
Xt = fi cos at | Vosin ent OO <p xl ao (5.116) 


where w is constant and U and V are r.v.’s. 
(a) Show that the condition 


E(U) = E(V) = 0 (5.117) 
is necessary for X(t) to be stationary. 


(6) Show that X(t) is WSS if and only if U and V are uncorrelated 
with equal variance; that is, 


BUY) = G Ey = eS" (3.118) 


(a) Now 


U(t) = ELX(1)] = E(U) cos wt + E(V) sin jr 


must be independent of t for X(t) to be stationary. This is possible 
only if (4) = 0, that is, E(U) = E(V) = 0. 


(b) If X(t) is WSS, then 


E[X?(0)|= E| X? = = Ry, (0) = 0,2 


But X(0) = U and X(u/2@) = V; thus, 
E(U*) = E(V’) = oy? = 0” 


Using the above result, we obtain 


Rift +t) = BAW + 7] 
= Fit cos wt + Vsin molt eos wt) + rl ¥ sin wif — Ti} 
= cos we t+ ACL) sin( Dit — weet (5.119) 


which will be a function of z only if E(UV) = 0. Conversely, if E(UV) 
= 0 and E(U’) = E(V*) = o°, then from the result of part (a) and Eq. 
(5.119), we have 


u(t) = 0 
R(t, t+ t) = & cos wt = R(t) 


Hence, X(t) is WSS. 
5.19. Consider a random process X(t) defined by 
X(t) = Ucost + Vsint —O< {<< © 


where U and V are independent r.v.’s, each of which assumes the 
values —2 and 1| with the probabilities 4 and ‘ respectively. Show that 


X(t) is WSS but not strict-sense stationary. 


We have 
| 2 
E(U)= BY) er ey 0 


4 2, 
E(U?) = E(V?) ==(-2) +30)" =2 


Since U and V are independent, 
E(UV) = E(U)E(V) = 0 


Thus, by the results of Prob. 5.18, X(¢) is WSS. To see if _X(Z) is strict- 

sense stationary, we consider ELX>(0)]. 

LAC) = Lith cose + ¥ sin | 

BLP pcos | SAL Vp cost ising FEOW pcos ssiote | AVA) sins 
Naw EW?) EV) 1-2) + Sp > 2 
BL? Vy = BEV) = 0 AU = BET) = 0 

‘Thus, BLN 'Wo) = —2tena e+ sin’ 

which is a function of ¢. From Eq. (5.16), we see that all the moments 


of a strict-sense stationary process must be independent of time. Thus, 
X(t) is not strict-sense stationary. 


5.20. Consider a random process X(t) defined by 
X(t) = A cos(@t + @) -xs <f¢f<mx 


where A and wm are constants and @ is a uniform r.v. over (—z, 1). Show 
that X(t) is WSS. 


From Eq. (2.56), we have 


1 
fo(9) =5 2x 
0 otherwise 


=f 0-7 


Then 


A x = 
ux(O=— J _ costi 8) d# =0 (5.120) 


Setting s = t+ 71in Eq. (5.7), we have 


a 


Ryylf,t tT) = . ‘i costa + fh coshedt bape ft 
: Aan TT 


5.21. 


A" ope | : 
ae = [eos ov loeostZewt | 26 eer) di (A121 
Pe cae 
= — cos nit 
2 


Since the mean of X(f) is a constant and the autocorrelation of X(£) is a 
function of time difference only, we conclude that X(t) is WSS. 


Let {X(0), t= 0} be a random process with stationary independent 
increments, and assume that _X(0) = 0. Show that 


E[X(O)] = pt (5.122) 
where 1, = ELX(1)]. 
Let 
FO = ELX(] = ELX( — X(0)] 


Then, for any ¢ and s and using Eq. (4.132) and the property of the 
stationary independent increments, we have 


f(t +s) = E[Xa +5) — XCD] 
= F[X(t + s) — X(x) + Xés) — X(0)] 
= KIX(t +s) — Xs)| + ELX(s) — X(0)| 
= E|X(t) — X(O)] + ELX(s) — X(0)] 
— fit) + fils) (5.123) 


The only solution to the above functional equation is f(t) = ct, where c 
is a constant. Since c = f(1) = ELX(1)], we obtain 


EIXM]= ut a, = ELX()] 


5.22. Let {X(4), t= 0} be a random process with stationary independent 
increments, and assume that _X(0) = 0. Show that 


(a) 
Var[X()] = 0,74 (5.124) 
(d) 
Varin — Xi) = ot ie- fs (5.125) 


where o,* = VarLX(1)]. 


(a) Let g(t) = VarLX(0)] = Var_X() — X(0)] 
Then, for any ¢ and s and using Eq. (4.136) and the property of the 
stationary independent increments, we get 


g(t +s) = Var|X(t + 5) — X(0)] 
= Var [X(t + s) — X(s) + X(s) — X(Q)] 
= Var [X(t + s) — X(s)] + VarLX(s) — X(0)] 
= Var| X(t) — X(O)| + VarlX(s) — X(0)| 
= a(t) + gs) 


which is the same functional equation as Eq. (5.123). Thus, g(t) = 
kt, where k 1s a constant. Since k = g(1) = Var[X(1)], we obtain 


Var[X(1)] = wm ( 7% = Var[X(1)] 


(b) Lett>s. Then 


Var LX(t)] = Var X(t) — X(s) + X(s) — X(0)] 
= Var[X(t) — X(s)] + Var |X(s) — X(0)] 
= Var[X(0) — X(s)] + Var[X(s)] 


Thus, using Eq. (5.124), we obtain 
Var [X(t) — X(s)] = Var[X(A)] — Var[X(s)] = 0,7 (¢ — 5) 


5.23. Let {X(4), t= 0} be a random process with stationary independent 
increments, and assume that _X(0) = 0. Show that 

Cov[AX(f), X(s)] = ALC, 8) =o, * mint?. 3) (3.120) 

where o,* = VarLX(1)]. 


By definition (2.28), 
War (Xi — Kori] = BCEXGY — Nos — ELA — Xo] 
= FULN(A) — ELM} — LX — BIXEE | 


= Fi{X{ey — EINCO[Y — 2X — BLXCGT AX) — FLX T= (X9) — ELXOTPS 
= Vol Xin] -— 2 Cov Ace. Xt] + Vir[As)] 


Thus, 


Cov|[X(t), X(s)] = : {Var|X(t)] + Var[ X¢s)| — Var[ X(t) — X(s)]} 


Using Eqs. (5.124) and (5.125), we obtain 


1 3 , 
por bs (ft s\J=o7s t>y 
Ket say 
parle +s —ol= 04" s>f 
OT ‘ 


Ky(t.s}=o, min(f,s) 


where o,* = Var[X(1)]. 


5.24. (a) Show that a simple random walk X(n) of Prob. 5.2 is a Markov 
chain. 


(b) Find its one-step transition probabilities. 


(a) From Eq. (5.96) (Prob. 5.10), X(7) = {X,,, 1 = 0} can be expressed 
as 


where Z,, (n = 1, 2, ...) are tid r.v.’s with 


P(Z, — k) — a, (k-1,-—1) and a,—p jj -g-T--_ 


Then X(n) = {X,,, 1 2 0} 1s a Markov chain, since 
6 cover ra 1% | | Xp =0,X, - or rere, 7H i) 


=P(Z,, +i, =i,_,|X, =0,X, =4, eae x =1) 
-_ PZ, + ] _ t, 1. t,) = a bn = P(X, + | =~ tn Y lX, = i) 


since Z,,,, 18 independent of Xp, Xj, ..., X;,. 


(b) The one-step transition probabilities are given by 


Pik = Piha =_ os ‘io ao 


’ k=jt+1 
ii k=j-l 


0 otherwise 


which do not depend on n. Thus, a simple random walk X(n) is a 
homogeneous Markov chain. 


5.25. Show that for a Markov process X(t), the second-order distribution is 
sufficient to characterize X(Z). 


Let X(t) be a Markov process with the nth-order distribution 


_ f(y "4 ‘an ft ip ain es "lr yoy € i ael We 
FX %, pons ee bi yteesaves bed P{X (=x, X,) =X, renga = x,} 


Then, using the Markov property (5.26), we have 


FSG aie pe Sa ie Ce A A =A Sa Mid feeaseatits 4 Nea. 42 
PIXE) aa Nit) aa eet py aa i 
= PUNT ) "a [ete wee Laer Fits patel ot eacctuy t es 
Applying the above relation repeatedly for lower-order distribution, 
we can write 


ny 
Fela ped ty ls revel) = Ryley) | | PEK ay Ath 1! fay ir (5.127) 
3 


Hence, all finite-order distributions of a Markov process can be 
completely determined by the second-order distribution. 


5.26. Show that if a normal process is WSS, then it is also strict-sense 
stationary. 


By Eq. (5.29), a normal random process X(f) is completely 
characterized by the specification of the mean ELX(t)| and the 


yy 
Ps, ile 


Diodes 


fae | peger 
Ry(t)— a 


i fa 
P rtneerise Kegel tila: fll, J Fea 


covariance function K,(t, s) of the process. Suppose that X(¢) is WSS. 
Then, by Eqs. (5.21) and (5.22), Eq. (5.29) becomes 


sf 04)- ye >» Saye — ft, Jo, | (3.128) 
=| ee 


Now we translate all of the time instants ¢,, t, ..., t,, by the same 
amount t. The joint characteristic function of the new r.v.’s X(t; + 7), i 
= ], 2, ...,n, 1s then 


i 


D) » TTF a hori, trina 
“a Dy uid; “235 y ky (ft; —f, wm, : 


i 


Mati tne tig http soos hed (at29) 


which indicates that the joint characteristic function (and hence the 
corresponding joint pdf) is unaffected by a shift in the time origin. 
Since this result holds for any n and any set of time instants (¢; € T, i= 


1,2, ..., 2), it follows that if a normal process is WSS, then it is also 
strict-sense stationary. 


Let {X(t), -0%© < t< 0} be a zero-mean, stationary, normal process with 


the autocorrelation function 


(5.130) 
lo otherwise 


Let {X(¢,), i= 1, 2, ..., n} be a sequence of n samples of the process 
taken at the time instants 


By Eq. (5.130), 


] 
i, = X(t 
ft, == > Xt) 
I=] 
Since X(f) is zero-mean and stationary, we have 
E(X(t;)]=0 
and 
Thus, 
E(i,) —E|— Dole 1S EX) ) 
: i=l 
and 
Var({i,) = E{[ fi, — Eft, TP } = ECG,” ) 
1 ft 1 f 
=El|— YX(t,)||— > X(t) 
1a | Pac 
| wt n l i n : ‘i 
=> > YAKWKUI= YY [&-D5| 
i=) k=1 i=1 k=1 me 


(S131) 


(5.132) 


k=i 


Ry((k -)T/2]= Jk -i=1 


SNIR 


Thus, 


Vard ti, J/= +()) = 
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Discrete-Parameter Markov Chains 


(3.133) 


5.28. Show that if P is a Markov matrix, then P” is also a Markov matrix for 


any positive integer n. 


Let 
Pro Piz" Pim 
P=[p,1= ” Z = | i 
Pmt Pm2 °° Pmm 


Then by the property of a Markov matrix [Eq. (5.35)], we can write 


Pi Pi2 °°) Pim 1 | 
Pi2 P22 *** Pom | {1 _ | 
Pm Pm2 °° Pmm I | 
or 
Pa=a 


where 


a=[1 1... 1] 
Premultiplying both sides of Eq. (5.134) by P, we obtain 
P-a=Pa=a 


which indicates that P? is also a Markov matrix. Repeated 
premultiplication by P yields 


P'a=a 
which shows that P” is also a Markov matrix. 
5.29. Verify Eq. (5.39); that is, 
p(n) = p(O)P" 


We verify Eq. (5.39) by induction. If the state of Xo is 7, state X, will 
be 7 only if a transition is made from 7 to 7. The events {X, =i, i= 1, 2, 


...} are mutually exclusive, and one of them must occur. Hence, by the 
law of total probability [Eq. (1.44)], 


P(X, = f)= SP(Xo = P(X = f|X =) 


or 


P=) pp;  f=12.... (5.135) 


i 


In terms of vectors and matrices, Eq. (5.135) can be expressed as 


p(1) — p(0)P (5.136) 


Thus, Eq. (5.39) is true for n = 1. Assume now that Eq. (5.39) is true 
for n = k; that is, 


p(k) = p(0)P* 


Again, by the law of total probability, 


P(X p41 == PX =)P Kes = 1X =H 


or 


p tk +1)= s B(KIP;; JHl.2,... (5,137) 
i 
In terms of vectors and matrices, Eq. (5.137) can be expressed as 
plk + 1) = pikP = pth P = pret! (5.138) 


which indicates that Eq. (5.39) is true for k + 1. Hence, we conclude 
that Eq. (5.39) is true for all n > 1. 


5.30. Consider a two-state Markov chain with the transition probability 
matrix 


l-a il ; : 
P= Qa“ lacb<] (5.139) 
b 1 b 


(a) Show that the n-step transition probability matrix P” is given by 


‘ 1 ||) a ;| a2 oH 
p= +(l-a—by (5.140) 
aib |p a =a | Ot 


(6) Find P” when n —> o, 


(a) From matrix analysis, the characteristic equation of P is 


A=(1—a) —a 
= fet -#B) 
=(A—-l\(A-—1l+at+b)=0 


c(A) =|AI - P|= 


Thus, the eigenvalues of P are A, = 1 and 1, = 1 —a— b. Then, 
using the spectral decomposition method, P” can be expressed as 


Pr =A," EB, FA" E, (5.141) 
where £ and £) are constituent matrices of P, given by 


] ? l 
A=——I[F-ii GS Pat 5.142 
I i l As! | 2D ae fd (5.1423) 


-_ - 


Substituting 2, = 1 and 7, = 1 — a— b in the above expressions, we 


obtain 

_ | [b a _ af a —-a 
tatb |b a 2 a+b \-i b 

Thus, by Eq. (5.141), we obtain 

Poe +a bye, 
i | ‘aed ala. a (3.143) 
= til-a-py i 
ath L Oa —h ob 


(6) If0<a<1,0<6<1,then0<1-—a<1and|l—a-—b|<1.So 
lim,,50 (1 — a— 6)" = 0 and 


boa 
| (5.144) 


| 
lim = 
b a 


i> 0 ath 


Note that a limiting matrix exists and has the same rows (see Prob. 
5.47). 


5.31. An example of a two-state Markov chain is provided by a 
communication network consisting of the sequence (or cascade) of 
stages of binary communication channels shown in Fig. 5-10. Here X,, 


Fig. 5-10 Binary communication network. 


denotes the digit leaving the nth stage of the channel and Xp denotes 


the digit entering the first stage. The transition probability matrix of 
this communication network is often called the channel matrix and 1s 
given by Eq. (5.139); that is, 


l-a a 
P= O<a<l1O0<b<l 
b 1-—b 


Assume that a = 0.1 and b = 0.2, and the initial distribution 1s P(Xp = 
0) = P(X) = 1) = 0.5. 

(a) Find the distribution of X,,. 

(6) Find the distribution of X,, when n > o. 


(a) The channel matrix of the communication network is 


09 0.1 
O2 0.8 


and the initial distribution is 
p(O) =[0.5 0.5] 


By Eq. (5.39), the distribution of X,, is given by 


i 


09 O01 
p(n) = p(0)P" =[0.5 0.5] a3 w" 


Letting a = 0.1 and b = 0.2 in Eq. (5.140), we get 


0.9 O17" 1 02 0.1 07" | 0.1 -0. 
bes a 03 bs fal 0.3 | | 


-—0.2 02 
2+ (0.7)" | —(0.7)" 
= 3 3 
2-2(0.7)" 1+2(0.7)" 
3 3 


Thus, the distribution of .X,, 1s 


240.7)" 1—(0.7)' 
pin)=[o5 os]] ° ‘ 
2-20.7)" 1420.7)" 
3 3 


_[2_@7" 1,7)" 
3 6 3. 6 


that is, 


P(X, =0)=2- 027 and P(X, = 1-540 


6 


(b) Since lim,_,., (0.7)” = 0, the distribution of X,, when n — 00 is 


P(X, =0)= 


wo |b 


and P(X, =)= 


wile 


5.32. Verify the transitivity property of the Markov chain; that 1s, if i > 7 
and j — k, then i — k. 


By definition, the relations i — j and j — k imply that there exist 
integers n and m such that p;” > 0 and Pal” > 0. Then, by the 
Chapman-Kolmogorov equation (5.38), we have 


1 my (ei my wa dm) mb ew ie 
Pi a s Pe hk Ph BR f) (5.145) 
f 


Therefore, i — k. 
5.33. Verify Eq. (5.42). 


If the Markov chain {X,,} goes from state i to state j in m steps, the 
first step must take the chain from 7 to some state k, where k 47. Now 
after that first step to k, we have m — | steps left, and the chain must 
get to state 7, from state k, on the last of those steps. That is, the first 
visit to state 7 must occur on the (m — 1)st step, starting now in state k. 
Thus, we must have 


(m) _ (m-1) = 
fij = > Px Sj m=2,3.,... 


k#j 
5.34. Show that in a finite-state Markov chain, not all states can be transient. 


Suppose that the states are 0, 1, ..., m, and suppose that they are all 
transient. Then by definition, after a finite amount of time (say 7), 
state 0 will never be visited; after a finite amount of time (say 7)), 
state 1 will never be visited; and so on. Thus, after a finite time 7 = 
max{To, 7), ..., T;,,}, no state will be visited. But as the process must 
be in some state after time 7, we have a contradiction. Thus, we 
conclude that not all states can be transient and at least one of the 
states must be recurrent. 


5.35. A state transition diagram of a finite-state Markov chain is a line 
diagram with a vertex corresponding to each state and a directed line 
between two vertices i andj if p, > 0. In such a diagram, if one can 


move from 7 and / by a path following the arrows, then i — 7. The 
diagram is useful to determine whether a finite-state Markov chain is 
irreducible or not, or to check for periodicities. Draw the state 
transition diagrams and classify the states of the Markov chains with 
the following transition probability matrices: 


00 05 03) 
Q 05 O83 
(a) P los Oo 0 by) P ih ali | 
= re | 5 [ = 
: ; ie mt Oo 0 
05 O85 0 | 
01 0 Q 
0a O04 OF 0 OS 
0 | 0 0 OQ 
(rr) P=)0 0 0 Of 04 
0 OO 0 0 | 
0 Oo | 0 


Fig. 5-11 State transition diagram. 


(a) The state transition diagram of the Markov chain with P of part 
(a) is shown in Fig. 5-11(a). From Fig. 5-11(a), it is seen that the 
Markov chain is irreducible and aperiodic. For instance, one can 
get back to state 0 in two steps by going from 0 to | to 0. 


However, one can also get back to state 0 in three steps by going 
from 0 to 1 to 2 to 0. Hence 0 is aperiodic. Similarly, we can see 
that states 1 and 2 are also aperiodic. 

(b) The state transition diagram of the Markov chain with P of part 
(b) is shown in Fig. 5-11(6). From Fig. 5-11(5), it is seen that the 
Markov chain is irreducible and periodic with period 3. 

(c) The state transition diagram of the Markov chain with P of part (c) 
is shown in Fig. 5-11(c). From Fig. 5-11(c), it is seen that the 
Markov chain is not irreducible, since states 0 and 4 do not 
communicate, and state 1 is absorbing. 


5.36. Consider a Markov chain with state space {0, 1} and transition 
probability matrix 


1 
P=)|] 
ps 


a) 


(a) Show that state 0 is recurrent. 
(6) Show that state 1 is transient. 


(a) By Eqs. (5.41) and (5.42), we have 


-( 


_ 1 
foo = fig ie = Pie = 3 


1 
to” = Poi fi’ = Os =f 


fig =0 n= fs 


Then, by Eqs. (5.43), 


fon = P(Ty < © [Xp = 0) = > foo” =14+04+0+4+---=1 


n=() 


Thus, by definition (5.44), state 0 is recurrent. 
(6b) Similarly, we have 


\ | 
ii mai for? = Por =0 


ea l 
fi? = Pio fa”? = [= =0 
fir? =0 n=2 


and 
fi = PUT, < | Xp =D = > fa” =) orgs 
n=O) 2 


Thus, by definition (5.48), state 1 is transient. 


5.37. Consider a Markov chain with state space {0, 1, 2} and transition 


probability matrix 
953 
P=11 0 0 
1 0 O 


Show that state 0 is periodic with period 2. 


The characteristic equation of P is given by 


oe 


a 
c(A)=|Al — P| = a 


eat ee Ga 


a © 


Thus, by the Cayley-Hamilton theorem (in matrix analysis), we have 
P? = P. Thus, for n> 1, 


iv @ 
L § 1 1 
CS Se a | | 
(2n) _ p2 2 2 2 2 So 
eee! = te Gl 8 Ol| = & 2 
1 O OF}}1 O O git 
2 2 
0 - = 
(2r+1)_p_| 7” @ 
‘ P=l1 9 9 
| Oo O 
Therefore, 


Thus, state 0 is periodic with period 2. 


Note that the state transition diagram corresponding to the given P 
is shown in Fig. 5-12. From Fig. 5-12, it is clear that state 0 is periodic 


with period 2. 
ue il, 
2 2 
2 1 
0 
1 1 
Fig. 5-12 


5.38. Let two gamblers, A and B, initially have & dollars and m dollars, 
respectively. Suppose that at each round of their game, A wins one 
dollar from B with probability p and loses one dollar to B with 


probability g = 1 — p. Assume that A and B play until one of them has 
no money left. (This is known as the Gambler s Ruin problem.) Let X,, 


be A’s capital after round n, where n = 0, 1, 2, ... and Xp =k. 

(a) Show that X(n) = {X,,,n = 0} is a Markov chain with absorbing 
States. 

(6) Find its transition probability matrix P. 


(a) The total capital of the two players at all times is 
k+m=N 


Let Z, (n= 1) be independent r.v.’s with P(Z, = 1) =p and P(Z,, = 
-l)=q=1-pforall nx. 
Then 


+Z n=1,2,... 


n n—1 n 


and X) = k. The game ends when_X,, = 0 or X,, = N. Thus, by Probs. 
5.2 and 5.24, X(n) = {X,,, n = 0} is a Markov chain with state space 


E= {0, 1,2, ..., N}, where states 0 and N are absorbing states. The 
Markov chain X(n) is also known as a simple random walk with 
absorbing barriers. 


(b) Since 
Pin = PX, ,ait1|[X,=) =p 
Prgny = P Apa = PO [X, =a 
Pp; = P(X, = i|X, = i) =0 I +0, N 
Poo = P(X, ., =0|X, =0) = 1 


Puy = P(X,.,=N|X,=N)=1 


the transition probability matrix P is 


g O 
Q ¢ 


B 


p 


Q 
0) 


0) 


For example, when p = g = 


oOo Oo 


i) ) NR 


| 
2 


and N= 4, 
0 0 O 
1 

—- 0 O 
2 

0 Z 0 

2 

1 | 
ae) a 
2 2 
0 0 1 


(5.146) 


5.39. Consider a homogeneous Markov chain X(n) = {X,,, 1 = 0} witha 
finite state space E = {0, 1, ..., N}, of which A = {0, 1, ..., m$, m= 1, 


is a Set of absorbing states and B= {m+ 1, ..., N} is a set of 


nonabsorbing states. It is assumed that at least one of the absorbing 
states in A is accessible from any nonabsorbing states in B. Show that 
absorption of X(7) in one or another of the absorbing states is certain. 


If X_ € A, then there is nothing to prove, since X(n) is already 


absorbed. Let Xp € B. By assumption, there is at least one state in A 


which is accessible from any state in B. Now assume that state k € A is 
accessible from j € B. Let nj, (< ©) be the smallest number n such that 


Pa? > 0. For a given state /, let n; be the largest of n, as k varies and 


n' be the largest of n; as 7 varies. After n’ steps, no matter what the 


5.40. 


wv va PIX, me HE A) 


initial state of X(n), there is a probabilityp > 0 that X(n) is in an 
absorbing state. Therefore, 


P{X, ,EB}=1-p 


and 0< 1-—p<1. It follows by homogeneity and the Markov property 
that 


PIX ny EB HCL — pF KD, 2p 


k(n‘) 
Now since lim,_,.,(1 — p)* = 0, we have 


lim P{X,€B}=0 or lim P{X, €B=A}=1 


| Saas 1) dealt) 


which shows that absorption of X(n) in one or another of the 
absorption states is certain. 


Verify Eq. (5.50). 


Let X(n) = {X,, n = 0} be a homogeneous Markov chain with a finite 
state space E = {0, 1, ..., N}, of which A = {0, 1, ..., m}, m= 1, is a set 
of absorbing states and B= {m+ 1, ..., N} is a set of nonabsorbing 
states. Let state k € B at the first step go to i > E with probability p,,. 
Then 


X,, = kCE B)} 


y | (5,147) 
= 3 pu P{%, = IE A)Xo = 3 
t=1 


Now 


P{X, = (EA), Xp = =4 0 iG Ait; 


ui; (EGE Byi=mt+l,...N 


Then Eq. (5.147) becomes 


if 
My = Pay es Pgh j kK=mt+1,..., Nog =To.jm (5,148) 


iat 


But py, k=m+1,...,N;7=1,...,m, are the elements of R, whereas 
Pip k=mt1,...,.N;i=m+1,..., N are the elements of Q [see Eq. 
(5.49a)]. Hence, in matrix notation, Eq. (5.148) can be expressed as 


(-—R-Oe ar (-QU-R (5.1495 


Premultiplying both sides of the second equation of Eq. (5.149) with 
— Q)!, we obtain 


U=(1—Q)!|R=®R 


5.41. Consider a simple random walk _X(n) with absorbing barriers at state 0 
and state N = 3 (see Prob. 5.38). 


(a) Find the transition probability matrix P. 
(b) Find the probabilities of absorption into states 0 and 3. 


(a) The transition probability matrix P is [Eq. (5.146)] 


0 1 2 3 
O71 O O O 
llg O 0 

a q Pp 
210 ¢g O p 
3/0 0 0 1 


(6) Rearranging the transition probability matrix P as [Eq. (5.49a)], 


0 3 1 2 
O71 O O O 
3}0 1 0 O 
P= 
lig 0 O p 
210 p gq O 


and by Eq. (5.49b), the matrices O and R are given by 


qg O y 5 0 
R= Pio Pi = Y = Pi Piz} _ Pp 
Px» P23 O p P21 P22 q O 


Then 
1 — 
and 
= 1 | p 2 
m=(/-Q) = (5.150) 
l—pq\q | 


By Eq. (5.50), 


z | 
om in Mig} OR - 1 fl plfa oy]. 1 
1 opatq lia pl long 


May, tay 
Thus, the probabilities of absorption into state 0 from states 1 and 2 
are given, respectively, by 


1 


f 
(5.151) 
(op 


a —@ 


|g a 
1— pq 


u — 
10 1— pq 


2“ 


and the probabilities of absorption into state 3 from states 1 and 2 are 
given, respectively, by 


U3 = and Uy3 = 
1— pq 1— pq 
Note that 
~ 1—p+p 
lig += 2 P= fF 
1—pq 1—p(l—p) 
2 a, 


which confirm the proposition of Prob. 5.39. 
5.42. Consider the simple random walk X(n) with absorbing barriers at 0 


and 3 (Prob. 5.41). Find the expected time (or steps) to absorption 
when Xp = | and when Xo = 2. 


The fundamental matrix @ of X(n) 1s [Eq. (5.150)] 


o-| an 1 p : 
bo, $2) I-pq\q 1 


Let 7; be the time to absorption when Xo = 7. Then by Eq. (5.51), we 
get 


l 
F(Ty-——l+ py) EU) (a+) (4,152) 
1— py ° 


T= py 
5.43. Consider the gambler’s game described in Prob. 5.38. What is the 
probability of A’s losing all his money? 


Let P(k), k= 0, 1, 2, ..., N, denote the probability that A loses all his 
money when his initial capital is k dollars. Equivalently, P(x) is the 
probability of absorption at state 0 when _X, = k in the simple random 
walk X(n) with absorbing barriers at states 0 and NV. Now if0<k<N, 
then 


Pi) = pPR + 1) tgPh—-Wy RETNA I (5.153) 


PLR 


where pP(k + 1) is the probability that A wins the first round and 
subsequently loses all his money and gP(k — 1) is the probability that 
A loses the first round and subsequently loses all his money. Rewriting 
Eq. (5.153), we have 


5h =payitre ) 0 & 1A 1 (3.154) 
p P 
which is a second-order homogeneous linear constant-coefficient 
difference equation. Next, we have 
PO) = 1 and P(N) =0 (5.155) 


since if k = 0, absorption at 0 is a sure event, and if k = N, absorption at 
N has occurred and absorption at 0 is impossible. Thus, finding P(x) 
reduces to solving Eq. (5.154) subject to the boundary conditions 
given by Eq. (5.155). Let P(k) =r“. Then Eq. (5.154) becomes 


tt 1 kd big 
P Pp 


r Pt gl 


Setting k = 1 (and noting that p + g = 1), we get 


p-Lrtdag—n[r—4 =0 
p P 


\ 


from which we get r= 1 and r= q/p. Thus, 


,« 


qgtp (5.156) 


Pk) = c, ley 4 


Ps 


‘ f 


where c, and c, are arbitrary constants. Now, by Eq. (5.155), 


P(0)=1—¢, +c, =1 


N 
Pwy=0-ata(-£} =0 
p 


Solving for c, and c, we obtain 


_=@p" 
{= fF 2 nw 
1—(q/p)" 1—(q/p)” 
Hence, 
} sk og ! N 
Pky AT) AgPY ie (ai) +p 
l—(qipy 
Note that if NV > k, 
q>p 


h 
P(k)= [¢) 
pq 
P 


Setting r = q/p in Eq. (5.157), we have 


P(k) = 


N 
l-r ed 


I 
>? 


Thus, when p = g = 


k 
P(k)=1— 
( N 


5.44. Show that Eq. (5.157) is consistent with Eq. (5.151). 


(3.157) 


(5.158) 


Substituting k = 1 and N = 3 in Eq. (5.134), and noting that p+ gq = 1, 


we have 


_(qip)—(qipyY _ ap"? - 4°) 
1-(qipy (pp —@) 
giptq) _ q _ 4g 
pt+pqa+q (pt+qy —pq 1-paq 


P(I) 


Now from Eq. (5.151), we have 


to = —. = P(1) 
1— pq 


5.45. Consider the simple random walk X(n) with state space E = {0, 1, 2, 
... N}, where 0 and WN are absorbing states (Prob. 5.38). Let rv. 7; 


denote the time (or number of steps) to absorption of X(n) when Xp = 
k,k=0,1,..., N. Find E(7;). 


Let Y(A) = E(7;). Clearly, if k = 0 or k = N, then absorption is 
immediate, and we have 


Y(0) = YN) =0 (5.160) 


Let the probability that absorption takes m steps when Xp = k be 
defined by 


PR, m) = P@, = m) m=1.2,... (5.161) 
Then, we have (Fig. 5-13) 
Pik, ary— pPik+lar—ly+a@Pik—-lLa-b (3.162) 


and 


Y(kK) = E(T,)= ¥ mPCK, mt) = p y mP(k+1in—-l+¢@ S mP(k — lal) 
N m i 


mL we | 


Setting m — 1 =i, we get 


Y= p Ye | 1)P(k 11,8) | 2X | WP(k 1,2) 


of 


—p SiP(k +1, 0-4 ire I, +p > Pes ta 5 Pk -1,i) 


0 i~0 


x(n) +-———- k - 1 


a ® a 
3 a e 
r ® 
a ® 
e e e 
r @ e 
e a 
e a 
tT @ 3s K n 


Fig. 5-13 Simple random walk with absorbing barriers. 


Now by the result of Prob. 5.39, we see that absorption is certain; 


therefore, 
» P(k +1,i)= Pet i)=1 
Thus, 
¥(k) = pY(K +1) 4+ @V¥(kK-—1)+pt+q 
or 


We Sane FIG WEE FS EN 


(5,163) 


Rewriting Eq. (5.163), we have 


Vk +1) -—Y¥(ky+ 2 (kk -1)=-— (5.164) 
p 


P P 


Thus, finding P(x‘) reduces to solving Eq. (5.164) subject to the 
boundary conditions given by Eq. (5.160). Let the general solution of 
Eq. (5.164) be 


¥(k) = Y,(k) + ¥ (k) 
where Y;(k) is the homogeneous solution satisfying 
: _ ae 
¥(k+1)-— ¥,(K)+— ¥,(k -D =0 (5.165) 
P P 
and Y,,(k) is the particular solution satisfying 
- aa ae , a 
Elk +1)-—2, 3 F— hk -—D--— (5.166) 
Pp P P 
Let Y,(4) = ak, where a is a constant. Then Eq. (5.166) becomes 
1 q, 1 
(k+lha-—kat+—(k-lha=-— 
P P P 
from which we get a = 1/(q — p) and 
: k 
¥,<«)=——_ 4p (5.167) 
Co pom P 
Since Eq. (5.165) is the same as Eq. (5.154), by Eq. (5.156), we obtain 


Y,(k) =c, +o[-2| Gp (5.168) 
.P. 


where c, and c, are arbitrary constants. Hence, the general solution of 
Eq. (5.164) is 


¥(kj=e, +e, | A gzp (5.169) 
PP, 


q—P 
Now, by Eq. (5.160), 
Y(0)=0 7c + C5 =0 


N 
Y(N)=0>¢, tol 2) fa sil 
q 


P a 
Solving for c, and cy, we obtain 


= NIG—P) ,  NKa— p) 


2 — Yr ay 
' 1=(qgipys 1—(q/p)* 


Substituting these values in Eq. (5.169), we obtain (for p # q) 


ry ‘ 
teh 
‘ai kag pa] tate on 
q—P\ 1-(@gi py bE 


i 


When p = q = 7 we have 
1 oe l 
Y(k)=ET,}=MN-k)  p=q= = 


5.46. Consider a Markov chain with two states and transition probability 
matrix 


(a) Find the stationary distribution p of the chain. 


(b) Find lim,_,., P”. 
(a) By definition (5.52), 


rS> 

v 
II 

> 


or 
0 
LP |i o[=™ P2!) 


which yields p,; = p>. Since p; + p> = 1, we obtain 


(b) Now 


1 
n=1,3,5,... 
0 
Pp" = 
1 O 
n=2,4,6,... 
0 1 


and lim,,_,,. P” does not exist. 


5.47. Consider a Markov chain with two states and transition probability 
matrix 


Nl] BRlw 
Nile Ble 


(a) Find the stationary distribution ) of the chain. 


(b) Find lim,_,., P”. 
(c) Find lim,_,,. P” by first evaluating P”. 
(a) By definition (5.52), we have 


> 


pP=p 
3. Ot 
or 4 4 
2 5 =|[p, p> 
[P, Pol tt IP, Po] 
2 2 
which yields 
4M 5 P2 Pi 
1 4ine 
4M 5 Pe P2 


Each of these equations is equivalent to p; = 2p,. Since p, + p» = 1, we 


~_[2 1 
ac 4 


(b) Since the Markov chain is regular, by Eq. (5.53), we obtain 


obtain 


lim P” = lim 


n> no 


mle | uw 
wilh ult 


(c) Setting a = ; and b = | in Eq. (5.143) (Prob. 5.30), we get 


2 


ae al a 

ge|2 4|_f2)) 5 3 

21) (4) |_2 2 

3 3 3 3 

Since lim, _, ,. (" = 0, we obtain 

3 17 72 1 

lim P"=lim|* 4] =|> > 
eels ae a) ee 

2 2 3 3 


Poisson Processes 


5.48. Let T,, denote the arrival time of the nth customer at a service station. 
Let Z,, denote the time interval between the arrival of the nth customer 
and the (n — 1)st customer; that is, 


) aes eek Aare = 1 (5,172) 


ft 


and 7, = 0. Let {X(t), t= 0} be the counting process associated with 
{T,, n = 0}. Show that if X(¢) has stationary increments, then Z,, n= 1, 
2, ..., are identically distributed r.v.’s. 


We have 
PZ, > 2) =1—PZ,S2) =1-F, (2) 
By Eq. (5.172), 
PZ 2) P(E, — Fy = PO, ee te) 


Suppose that the observed value of T,,_ ; ist, _;. The event (7, > T,, _ , 
+ z|T,,- 1 =t,— 1) occurs if and only if X(4) does not change count 
during the time interval (¢, _ |, t, _; + Z) (Fig. 5-14). Thus, 


P(Z, ‘ | r I 7 ti, 1) a P(T, > ‘wl I ate z| e l - f, 1) 
= P[X(t,_, +2) — X(t,_,) = 0] 
or 
OG, ~s I", ] h 1! PIX, ] | z ALE ! ni] a Tah 


Since X(t) has stationary increments, the probability on the right-hand 
side of Eq. (5.173) is a function only of the time difference z. Thus, 


PZ = 2|9 fl YS Pe (a. 174) 


a-] A 


which shows that the conditional distribution function on the left-hand 
side of Eq. (5.174) is independent of the particular value of 7 in this 
case, and hence we have 


F, (2) = PZ, =z) = 1 — PIX) = 01 (5.175) 


which shows that the cdf of Z,, is independent of n. Thus, we conclude 
that the Z,,’s are identically distributed r.v.’s. 


i — 2) ee a aa a 
Ee 
[i F \, te ot l. ! 
Fig. 5-14 


5.49. Show that Definition 5.6.2 implies Definition 5.6.1. 
Let p,(t) = PLX(®) =n]. Then, by condition 2 of Definition 5.6.2, we 
have 


p(t + At) = PIX(t + At) = 0] = PIX(f) = 0, X(t + At) — X(0) = 01 
= P[X(t) = 0] PLXG + AN — X(t) = 0] 


Now, by Eq. (5.59), we have 


PLX(1 + Ar) — X(t) — O] ~1—A At + 0(AD) 


sila po(t + AN) — py (UI — A At + of AD] 
or bu ee era of Ar) 
t 


Letting At—0, and by Eq. (5.58), we obtain 
py =— Ap, (t) (5.176) 
Solving the above differential equation, we get 
Pylt) = ke” 
where & is an integration constant. Since pp(0) = PLX(0) = 0] = 1, we 


obtain 


P(t) — ew (5.177) 
Similarly, for n > 0, 


Pitt An— PIX + AN— a 
= PLN =n XU + AN X= 0 


a 


+ LX (=n 1, MEE AL) XO At YS PLN = A Xr A) XM) =A 


hau 


Now, by condition 4 of Definition 5.6.2, the last term in the above 
expression is o(A?). Thus, by conditions 2 and 3 of Definition 5.6.2, 
we have 


p(t t+ Aty= p, itl — A Att of At] — p,_ (ty A At + ot Ary] + afar) 


pli + Ab) pt) : ; ral Al} 
Ey Meena da _ ad 


‘Thus 


and letting At — 0 yields 


p(t} + Ap At) _ Ap,,- if) (S.178) 

Multiplying both sides by e”“, we get 
e*[p', (0) + Ap, (0) = Ae*p, _,O 

Hence, 

“|e P,(t)| = Ae” p, i{f) (5.179) 
Then by Eq. (5.177), we have 

< [e“ py (N= A 
or 
p(t) = (At + c)e~* 


where c is an integration constant. Since p,(0) = PLX(0) = 1] = 0, we 
obtain 


p(t) — Ate M (5.180) 
To show that 


—iht (Ar)” 


Prt) =e 
Nn. 


we use mathematical induction. Assume that it is true for n — 1; that is, 


ree ae 


Pn-i€) =e (n—1)! 


Substituting the above expression into Eq. (5.179), we have 


n n—-| 


2 p= 
He Pat) (nD! 


Integrating, we get 


Since p,(0) = 0, c; = 0, and we obtain 


ar (A) (5.181) 


p, (ti) =e 
nt! 


which is Eq. (5.55) of Definition 5.6.1. Thus we conclude that 
Definition 5.6.2 implies Definition 5.6.1. 


5.50. Verify Eq. (5.59). 


We note first that _X(¢) can assume only nonnegative integer values; 
therefore, the same is true for the counting increment X(t + At) — X(¢). 
Thus, summing over all possible values of the increment, we get 
S PUXG + AD X= k= PPX(t — Ath — X(t) =0| 
iO 
PLA + Aly — 20) — H+ PAG + Af) — X00) = 2] 
=I 


Substituting conditions 3 and 4 of Definition 5.6.2 into the above 
equation, we obtain 


P[X(t + At) — X(t) = 0] = 1 — AAT + o(At) 


5.51. (a) Using the Poisson probability distribution in Eq. (5.181), obtain 
an analytical expression for the correction term oA(t) in the 


expression (condition 3 of Definition 5.6.2) 
P[X@ + Af) — XG) = Ll] = A Att ofAn (3.182) 


(6) Show that this correction term does have the property of Eq. 
(5.58); that is, 


.  O(AT) 
lim ——= 
At>0 At 


0 


(a) Since the Poisson process X(t) has stationary increments, Eq. 
(5.182) can be rewritten as 


P| X(Af} = 1] = p (At) = AAD | of AN) (5.183) 
Using Eq. (5.181) [or Eq. (5.180)], we have 


p,(At) = AAte~*44 = AAT + e7 445-1) 
=A At+AAt(e~** — 1) 


Equating the above expression with Eq. (5.183), we get 
A At + o(At) = A At + AAt(e~*“ — 1) 
from which we obtain 
o(Al) = A At(e7** — 1) (5.184) 
(b) From Eq. (5.184), we have 


ee ; 
tity OO pei Me EY roe eps tlt _ gyi 
Ato AT Ar—0 At Atr0 


5.52. Find the autocorrelation function R,(¢, s) and the autocovariance 
function Kt, s) of a Poisson process X(t) with rate A. 


From Eqs. (5.56) and (5.57), 


E[X()] = At Var [X(t)] = At 


Now, the Poisson process X(f) is a random process with stationary 
independent increments and_X(0) = 0. Thus, by Eq. (5.126) (Prob. 
5.23), we obtain 


KC 8) “min(f, s)— Amin(s, ¥) (3,185) 


since 6, = VarLX(1)] = 4. Next, since ELX(0)] ELX(s)] = /7ts, by Eq. 

(5.10), we obtain 

R(t, 8) — Aminif, 9) + 70 (5.186) 
5.53. Show that the time intervals between successive events (or interarrival 


times) in a Poisson process X(f) with rate 1 are independent and 
identically distributed exponential r.v.’s with parameter J. 


Let Z,, Z5, ... be the r.v.’s representing the lengths of interarrival times 
in the Poisson process X(t). First, notice that {Z, > t} takes place if and 


only if no event of the Poisson process occurs in the interval (0, ¢), and 
thus by Eq. (5.177), 


P(Z, > t) = P{X(t) = 0} = e~™ 


or 
F,() = P(Z,<t)=1-e™ 
Hence, Z, is an exponential r.v. with parameter 1 [Eq. (2.61)]. Let /,(0 
be the pdf of Z,. Then we have 
P(Z, > t)- [PZ =f) Z, —t)findr 
a X(t lt) X()=Ol fi (tide 


Te" fidr— (S.L87) 


5.54. 


S50. 


which indicates that Z, is also an exponential r.v. with parameter / and 
is independent of Z,. Repeating the same argument, we conclude that 
Z,, Zy, ... are 1id exponential r.v.’s with parameter A. 


Let 7,, denote the time of the nth event of a Poisson process X(f) with 
rate 1. Show that 7,, is a gamma r.v. with parameters (n, A). 


Clearly, 


T=Z,+Z,+--4+Z, 
where Z,, 1 = 1, 2, ..., are the interarrival times defined by Eq. 
(5.172). From Prob. 5.53, we know that Z,, are iid exponential r.v.’s 
with parameter 1. Now, using the result of Prob. 4.39, we see that 7", is 


a gamma r.v. with parameters (n, A), and its pdf is given by [Eq. 
(2.65)]: 


. yoke nl 
_— |e sa ( f=) ‘ 
O- (77 — 1)! (5.188) 
0 t=O 


The random process {T7,,, 1 = 1} is often called an arrival process. 


Suppose ¢ is not a point at which an event occurs in a Poisson process 
X(t) with rate 2. Let W(t) be the r.v. representing the time until the next 
occurrence of an event. Show that the distribution of W(A) is 
independent of ¢ and W(t) is an exponential r.v. with parameter J. 


Let s (0 <s <1) be the point at which the last event [say the (n — 1)st 
event] occurred (Fig. 5-15). The event {W(t) > t} 1s equivalent to the 


event 


A Phase EZ Ps sy 


5.56. 


Fig. 5-15 
Thus, using Eq. (5.187), we have 
PIW() >t] =P(Z, >t —st+2|Z, >t-s) 


Piz, >t —9) e Mims) 
and 
PIW (1)=1]=1—e™ (5.189) 


which indicates that W(f) is an exponential r.v. with parameter 4 and is 
independent of ¢. Note that W(t) is often called a waiting time. 


Patients arrive at the doctor’s office according to a Poisson process 
with rate A = = minute. The doctor will not see a patient until at least 
three patients are in the waiting room. 


(a) Find the expected waiting time until the first patient is admitted to 
see the doctor. 


(6) What is the probability that nobody is admitted to see the doctor in 
the first hour? 


(a) Let T,, denote the arrival time of the nth patient at the doctor’s 
office. Then 


T.=Z,+Z,+--4+Z 


n 


where Z,, 1 = 1, 2, ..., are 11d exponential r.v.’s with parameter 
= ay By Eqs. (4.132) and (2.62), 


E@,) > 


‘($2 |- SEZ, on (5.190) 


, ist 


The expected waiting time until the first patient is admitted to see 
the doctor is 


E(T,) = 3(10) = 30 minutes 


(b) Let_X(t) be the Poisson process with parameter A = a} The 


probability that nobody is admitted to see the doctor in the first 
hour is the same as the probability that at most two patients arrive 
in the first 60 minutes. Thus, by Eq. (5.55), 


PEXGO) «X(OH=2] PLNKON) XO) O] PURO) KM WL PERC Xi) 2] 


Ace) reel ' 60 J+ ann i Gl} | 
—& +e fee —|— 
10 2110 | 


—e “(1 4+6418}= 0002 


5.57. Let T,, denote the time of the nth event of a Poisson process X(¢) with 


rate A. Suppose that one event has occurred in the interval (0, 7). Show 
that the conditional distribution of arrival time 7; is uniform over (0, 


‘). 


Fort <t¢, 


PIT, =7.X()- 
PLX(n)=1] 
PLX(r)=1, X(t) — X(ny =] 
7 PLX(f)—1] 
_ PLX(r) = YPLX() — X(r) = 0) 
7 PIX) =1 


PIF, Sr |XU)=11= 


—Aar-A{t—r} 
A eee eg _ 
= SO (3.191) 


Ate“ 


which indicates that 7; is uniform over (0, t) [see Eq. (2.57)]. 


5.58. Consider a Poisson process X(t) with rate 7, and suppose that each 
time an event occurs, it is classified as either a type 1 or a type 2 event. 
Suppose further that the event is classified as a type 1 event with 
probability p and a type 2 event with probability 1 — p. Let_X,(@) and 


X>(t) denote the number of type 1 and type 2 events, respectively, 
occurring in (0, ft). Show that {X)(0, t= 0} and {X,(0), t= 0} are both 
Poisson processes with rates Ap and A(1 — p), respectively. 
Furthermore, the two processes are independent. 


We have 
X(t) = X,() + XO 


First we calculate the joint probability PLX,(Q = 4, X,(0 = ml]. 


PIX (f= hk, XH) =m] = te PIX (=k, X3(th)= | X(t) =AIPL XH) =a] 
n=O 


Note that 


PIX, () =k, X= m|X(0 =n] =0 when #A +m 


Thus, using Eg. (5.181), we obtain 


PLX (t)—&, X3(f) — m] — PLX (ft) — &, X30) — ar X(t) — & + me PL X(t) —& + an] 

'Zyxt—m 
AX hk, BO mI XO -k tml a! ae 
(kK — my)! 
Now, given that £4 + m events occurred, since each event has 
probability p of being a type | event and probability 1 — p of being a 
type 2 event, it follows that 


| k+m) , m 
P[X\() =k, X,() = m|X() =k + m|= : p (1— p) 


Thus, 


i (ary lo 


Pl ¥ ur) be GE | (' | i hry i 
Nih=k, X= m=] T- pie 
: S , & IF ? Ch + ate}! 


ETA yey a AP 


pre 
Alani! (hk — nil 
-in (Apry —iil-m" [acl ao pir]” 
=F ‘t —_—__;" vl PbS eed (3,192) 
k! rt! 
Then, 
PX (t)— kJ — SY PEX) — ke Xan) — a] 
m—] 
(Agr [4 |- er 
saat Dali ide ue e Alli y ( ph 
nt-] 


ay fA : ede pe nyt 
= AE pi} dll ait AML PR 
K 
— cant (ARTY (3.193) 
kt 
which indicates that X,(2) is a Poisson process with rate Ap. Similarly, 
we can obtain 


PLX,(@¢)=m]= S PLX,(t) =k, X(t) =] 


. 4. 
—I(— pre A(1— pi (5.194) 
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and so X,(f) is a Poisson process with rate A(1 — p). Finally, from Eqs. 
(5.193), (5.194), and (5.192), we see that 


P[X,(t) = k, XQ) = m] = P[X,) = KIP [X,() = m) 
Hence, X,(t) and_X,(t) are independent. 


Wiener Processes 


5.59. Let Xj, ..., X,, be jointly normal r.v.’s. Show that the joint 
characteristic function of Xj, ..., X,, is given by 


fi A 


Vy sid (a)... oy OD | |= vi | s fo; thy — Ly s Cd AD OF / (5,195) 


a “et Fel 
where w; = E(X;) and o,, = Cov(X;, X;). 


Let 


¥=<a,%, PAs SCs at / a, 9 


n n 


By definition (4.66), the characteristic function of Y is 


SC) taal CA ak PE Cee (5.196) 


Now, by the results of Prob. 4.72, we see that Y is a normal r.v. with 
mean and variance given by [Eqs. (4.132) and (4.135)] 


(5.197) 


ly — HY) — Sa; E(X; )- Sa iM; 
i=] 


gy —VarlY)— y Ya a, Cowl X,.X.)— ,) Ya, Fu (3.198) 
i=l «= i=l «= 
Thus, by Eq. (4.167), 
me - —— 
Wy (im) = exp) jay ae aa | 
} (3.199) 
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Equating Eqs. (5.199) and (5.196) and setting w = 1, we get 


; 1 rn it 
Wy, i A ee vol > a;U; — ae » G4.O jp 


i=] 


By replacing a,’s with w,’s, we obtain Eq. (5.195); that is, 


fl fl de 
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Then we can write 


uw 
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and Eq. (5.195) can be expressed more compactly as 
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5.60. Let X,, ..., X,, be jointly normal r.v.’s Let 


Let 


Y, = ay, X, Pie eM, 
(5.201) 
An = Ain X rE an 


HVT” 


where aj, (i= 1, ..., m; j= 1, ..., 1) are constants. Show that Yj, ..., Y,, 


are also jointly normal r.v.’s. 


Let 
X, Y ayy hy 
X= Y=|: A=la, |=] : 
Xn Ln Gry) Ginn, 
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Wy -ECM- ¢) oH} | Ry-[o]-[5 3 (5.202) 
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Then the characteristic function for Y can be written as 


Pi(o,, wees O.) = E(eJ”"Y) = E(es0%4x) 


i 
— BI pWhAToTX) — Tt, 
= Flel70"X) — wy (Alo) 


Since X is a normal random vector, by Eq. (5.177) we can write 


4 me ie we ar ™ 
Va (Ala) =exp|j(A"e) Uy — (47 w) Kx (4%) 
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=exp [jo Hy — Lol AKAlo| 


Thus, 
: : 
Wy (ry, P= exp| jor wy. — = wo! Kw | (5.203) 
where 
fy ~Ap, KAKA’ (5.204) 


Comparing Eqs. (5.200) and (5.203), we see that Eq. (5.203) is the 
characteristic function of a random vector Y. Hence, we conclude that 
Y,,..-, Y,, are also jointly normal r.v.’s 


Note that on the basis of the above result, we can say that a random 
process {X(f), t € T} is a normal process if every finite linear 
combination of the r.v.’s X(¢j), 4; € T is normally distributed. 


5.61. Show that a Wiener process X(f) is a normal process. 


Consider an arbitrary linear combination 


ul 
we (5205 
Ss aA (1) =a, X(t) )— a, X(t +--+ a, Xt} (9.205) 


i=] 


where 0 <t, <... <¢, and a; are real constants. Now we write 


Sa XU, ba tte, eee ME PO AOL + dts ee a I A AG 


to-tla. pba lah. oA, 2a, LAG, Alt, (3.206) 


Now from conditions 1 and 2 of Definition 5.7.1, the right-hand side 
of Eq. (5.206) is a linear combination of independent normal r.v.’s. 
Thus, based on the result of Prob. 5.60, the left-hand side of Eq. 
(5.206) is also a normal r.v.; that is, every finite linear combination of 
the r.v.’s X(¢;) 1s a normal r.v. Thus, we conclude that the Wiener 
process X(t) is a normal process. 


5.62. A random process {X(t), t € T} is said to be continuous in probability 
if for every e > 0 and t € 7, 


lim P{|X@ +h) — XQ) > €}=0 (5.207) 
iQ - 


Show that a Wiener process X(f) is continuous in probability. 


From Chebyshev inequality (2.116), we have 
Var| X(t +h)— X(t} 
P{|x¢+h)—X@|>e}s Jee, 
2 


Since X(f) has stationary increments, we have 
Var[X(t + A) — X(t)] = Var[X(h)] = o2h 
in view of Eq. (5.63). Hence, 


2 
¥ 
Te a0 


lim Pi| AC + A): X(t)|> e} = lim 
h>0 h>0 ¢€ 


Thus, the Wiener process X(f) is continuous in probability. 


Martingales 


5.63. Let Y= X, + X, + .X3 where X; is the outcome of the ith toss of a 
fair coin. Verify the tower property Eq. (5.76). 


Let X; = 1 when it is a head and_X; = 0 when it is a tail. Since the coin 
is fair, we have 


No 


and_X;,’s are independent. Now 


E|E(Y|F,)|F,] = ELE |X,, X,) |X] 


=X, + E(X,) + AX) =X, +1 


and 
E(Y|F,) = E(Y|X,) = EX, +X, +.X,|X,) 
=X, + E(X, + X,) 
=X, + E(X,) + E(X%,) =X, +1 
Thus, 


E[E(Y|F,)|F,] = E|F,) 


5.64. Let X), X>, ... be 1.i.d. rv.’s with mean w. Let 


S=SX, =X, +X, +--+X, 
i=] 


Let F,, denote the information contained in_Xj, 


X,. Show that 
ES AP) = SH — au 


fH = Hl (3.208) 
Let m <n, then by Eq. (5.71) 
E(S,|F,) = E(X, + +X |F) + EX 


m m+ none X, |F,) 


Since X, + X, +... + X,, is measurable with respect to F’,,, by Eq. 
(3.73) 


EX, + +X |F =X, t+ 4+X,=8 


Since X,,4, +... +X, 1s independent of X), ..., X,,, by Eq. (5.75) 


BX, ter t+ X|F)= EX. 


wi 


Fie bX) = (nm) 


1 
Thus, we obtain 


E(S,|F.,) =5,, + (a — m) pu m<n 


m 


5.65. Let X), X5, ... be iid. rv.’s with E(X,) = 0 and E(X,7) = o° for all i. Let 
S= 5 X,=X,+X,++-+X, Let F’, denote the information 
i=l l L Al 


contained in X), ..., X,,. 


Show that 


pW neg ae Sg (ih ea 7 ee (3.209) 
Let m <n, then by Eq. (5.71) 


E(S,?|F) = EUS, + (8, - SPF, 


Mn 


= Es,” | ae oy - E|s nm, —s re ) | Fal ¥ Els, “4 SP | F, 


n 


Since S,, is dependent only on Xj, ..., X,,, by Eqs. (5.73) and (5.75) 


a 


Aa? 6 ar aC. ee Gey a ae Pe. Sa ee 
since E(X;) = 4 = 0, Var(X;) = E(X7) = o? and Var(S,, — S,,,) = Var(Xin+1 
+... +X,)=(n- m)o*. Next, by Eq. (5.74) 


EIS (S.—S)[F 


my a fi 


] = Sin £\(S,, —$ ) [Fe | =a E(S,, = Sy) = 


fh m "a 


Thus, we obtain 
KS? |F,)=S8,2+(n-m oe — m<n 
5.66. Verify Eq. (5.80), that is 
E(M_,|F) = M, form =n 
By condition (2) of martingale, Eq. (5.79), we have 


EM, |F) =m, for all n 


Then by tower property Eq. (5.76) 


E(M 


n+2 


IF.) =E(EM, ,,|F,,)|F,1= EM,,,|F) =, 


and so on, and we obtain Eq. (5.80), that is 


E(M 


m 


|F)=M, form=n 
5.67. Verify Eq. (5.82), that is 
EM) = EM, _ = @ = 4M) 
Since {M,, n = 0} is a martingale, we have 
EM, |F)=M, for all n 
Applying Eq. (5.77), we have 


ELE(M 


il 


F,)| = E(M, ,,) = Edt) 


Thus, by induction we obtain 


BOL) = BU = a BM) 


5.68. Let X,, X5, ... be a sequence of independent r.v.’s with E[LX,,|] =< 00 
and E(X,) = 0 for all n. Set Sy=0, 8, = X,=X, +X, +--+ X, 
Show that {S,,, 7 => 0} is a martingale. 


BE) SPEER] AG [eaoel (eg RHEE |G A ant CM Te 
KS 4) |") =e X, AED 


= S, aN E(X, ilF,) = Si, 2 E(x, | ) = S, 


since E(X,,) = 0 for all n. 


Thus, {S,,, 7 = 0} is a martingale. 


5.69. Consider the same problem as Prob. 5.68 except E(X,,) = 0 for all n. 
Show that {S,,, 1 = 0} is a submartingale. 


Assume max E(|X,|) = & <0, then 


ELS, |] = E(|X,|+ .. seis lg i laces 
[Ey = ECS Ps PD 
=5 +HX,,,|F)=S. +HX,, DES. 


EX: D+ | 


since E(X,,) = 0 for all n. 


Thus, {S,,, 1 = 0} is a submartingale. 
5.70. Let X, X>, ... be a sequence of Bernoulli r.v.’s with 


y= | — with probability p 
‘ —|=1 with probability g =1—p 


Let§, = z X= X,+X,+--+X, Show that (1) if p = 1 5 then 
{S,} is a martingale. (2) if p > = ; then £5, } is a submartingale, — (3) 
ifp<-= s then {S,} 1s a conan 


BX) =p tre —piteh=2e-1 
(1) If p = o E(X,)= 0, and 


El 


9, 


JS AC|X,| + + [X41 = BUX, D+ oo + BX, 
ES, tI [f) we E(S,, 2 Xx, + | [,) 
—$ +HX, | )- 5, 4+ HX, )-4 


Dp=o<% 
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Thus, {S,} is a martingale. 
(2) If p > ; 0 < E(X,) < 1, and 


E\ |S. |] = BX, (+--+ [X= KX, + -- + EUX Ys a<e 
E(S , |F)=AS,+X _ |F) 
=S +X,  |FJ=S,+HX,)>S 


rT 


Thus, {S,,} is a submartingale. 
(3) If p < 5 E(X,) <0, and 


ES, 4 1F,) = HS, + X21 1F) 


n+l i 


=S + E(X,,, |F) =S$ +H(X.,) <5 


n 


Thus, {S,,} 1s a supermartingale. 


Note that this problem represents a tossing a coin game, “heads” you 
] 


win $1 and “tails” you lose $1. Thus, if p = =, it is a fair coin and if p > 


2 
5 the game is favorable, and if p < i the game is unfavorable. 


5.71. Let X), X>, ... be a sequence of 1.1.d. r.v.’s with E(X;) =u > 0. Set 


Sp 7 0, 5, = z x 7 x + x, ee oe x. ancl 
1 iit: mma (8: (5.210) 


Show that {M,, n= 0} is a martingale. 


F|| M, } = F( io nu|) = (|S, ') task S | X, | + ype = 2iypt <= 


inl 
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Next, using Eq. (5.208) of Prob. 5.64, we have 


EM, |F)=HS,., —(@+1) wlF) 
= ES .,|F-@+1) pu 
= ord) as, On ee, 


ar | 


Thus, {M/,, 1 = 0} is a martingale. 


5.72. Let X), X5, ... be iid. rv.’s with E(X,) = 0 and E(X7) = o° for all i. Let 
So = 9,5, = 3 X= X, +X, +--+ +X, and 


ae 2 


M,.= 3, te (5.211) 


Show that {M/,, n = 0} is a martingale. 
M,= * — ng? [3+] — no” = 3x +2 XX; —no* 


Using the triangle inequality, we have 


E(|M, y= Se(x, *)+2 9 E()X;X, |) + no? 


i=1 i<j 


meek e 


Using Cauchy-Schwarz inequality (Eq. (4.41)), we have 


E(| X;X; |) = J EX? )E(X?) = 0° 


Thus, 


“a(n — n(at3 
M |) = no? + MAT G2 4 ng? = AT) Gg? < 
a 2. 2 


E| 


Next, 


E(M,, ,|F,) =EUX,,, +8)? -(@t Do? |F] 
-" ELX?, +1 * 2X, 1 Sy 1 5, = (a + 9) o |F,] 
M,, + E(X;, t ) an 2E(X,, t Ds, - o 


M+e#-o@@=M 
nh n 


Thus, {M,, n= 0} is a martingale. 


Let X), X5, ... be a sequence of 1.1.d. rv.’s with E(X;) = uw and E(|X;|) < 
oo for all 7. Show that 


is a martingale. 


r 
ul 


ut 
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be. 
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: ; : ] : 
E( Maul) = [8s Xyail 4) 


= M, 2 E(X,,41) rw M, £ ae M,, 
u u 


Thus, {M,,} is a martingale. 


5.74. An urn contains initially a red and black ball. At each time n > 1, a ball 
is taken randomly, its color noted, and both this ball and another ball 
of the same color are put back into the urn. Continue similarly after 
draws, the urn contains n + 2 balls. Let_X, denote the number of black 
balls after n draws. Let M, = X,, / (n + 2) be the fraction of black balls 
after n draws. Show that {M,, n = 0} is a martingale. (This is known as 
Polya’s Urn.) 


X, = land X, is a (time-homogeneous) Markov chain with transition 


i. = A[X, =e)" 728 


P(X.4;=k+1 
ili n+2 n+2 


Xx, =k\= unc p(X 
' } 


and X,, takes values in {1, 2, ...,2 +1} and E(X,4:|X,)=X fi Pe 


nm nt+2 
Now, 
E||M,||=E [|| aol 
[ nl] nt+2) n+2 
and 
4 : 
£(M, Fa) = £| Xa 


| 
a+3 


-{ yw , | , X, X,, 
E(Xn+1 Xn) 7 il’ T Te SS Se 


Thus, {M/,, 1 = 0} is a martingale. 


5.75. Let X,, X>, ... be a sequence of independent r.v.’s with 


PS, 


5.76. 


PAX = 1} = PX = -1} = 5 


We can think of X; as the result of a tossing a fair coin game where one 


wins $1 if heads come up and loses $1 if tails come up. The one way 
of betting strategy is to keep doubling the bet until one eventually 
wins. At this point one stops. (This strategy is the original martingale 
game.) Let S, denote the winnings (or losses) up through n tosses. Sp = 


0. Whenever one wins, one stops playing, so P(S,,,; =1|S, =1)= 1. 
Show that {S,, 1 = 0} is a martingale—that is, the game 1s fair. 


Suppose the first 7 tosses of the coin have turned up tails. So the loss 
S,, 1s given by 


§ =-(1+24+4+--+2"-1)=-(2"- 1) 


il 


At this time, one double the bet again and bet 2” on the next toss. This 
gives 


P=1S= @ w=], PS, = @ dIS= @ m=! 


and 


Thus, {S,, 1 = 0} is a martingale. 


Let {X,,, 1 = 0} be a martingale with respect to the filtration F,, and let 
g be aconvex function such that E[g(X,,)] <0 for all n = 0. Then show 
that the sequence {Z,, n = 0} defined by 


T= BO (5.213) 


is a submartingale with respect to F’,. 
E(|Z, |) = E(|g(Xn|) < 


By Jensen’s inequality Eq. (4.40) and the martingale property of X,,, 
we have 


F(Z, |F) = Ele(X,..,)|%,] = g[EX,,  |F1 = 2X, = Z, 
Thus, {Z,, 1 = 0} is a submartingale. 
5.77. Let F,, be a filtration and E(X) < 0. Define 
X= EX|F) (5.214) 


Show that {X,,, 1 = 0} is a martingale with respect to F,,. 


E(| F(X|F,) 


E(|X, |) )s|E[E(X|F,)]|=|200|<@ 


B(Xpai| Fr) = #[ E(x Fret Fa 


=E(X|F,) by Eq. (5.76) 


Thus, {X,,, 1 = 0} is a martingale with respect to F,,. 


5.78. Prove Theorem 5.8.2 (Doob decomposition). 


Since X is a submartingale, we have 


EX, |F) =X, (5.215) 


a 


Let 


do = EX, — X,[0) = bx 


be avi lfgt ~ 4X, =U (5.216) 


5.79. 


and d,, is F,,-measurable. 


Set A ="5 d,=d,+d,+++d,_,andM, =X, — A,. Then it is 
easily seen that (2), (3), and (4) of Theorem 5.82 are satisfied. Next, 


E(Ma+i|Fa)= E(X +1 — Ansel F,)= E( X41 | F,)— Ana 


n 


Hn n—| 
= xh +, in Sd, ia x, _ > d; = X, Pn 2 M, 


i=l i=l 
Thus, (1) of Theorem 5.82 is also verified. 


Let {M,,, n= 0} be a martingale. Suppose that the stopping time T is 
bounded, that is 7 < k. Then show that 


E(M,) = E(M,) (5. 


tn 
i) 
— 
~] 
— 


Note that /;7-;,, the indicator function of the event {7 =j}, 1s F,,- 


measurable (since we need only the information up to time n to 
determine if we have stopped by time 7). Then we can write 


and 


E(M;| Fi_1) =E(M, Iran \e-a)t+ > E(M, leypa Fx) 


Forj<k— 1, Mj; 7-), is Fy_-measurable, thus, 


EM; Fer=plFu—1) = Mla 


Since 7'is known to be no more than k, the event {7 =k} is the same 
as the event {7 > — 1 which is F;_,;-measurable. Thus, 


E(Me Kran) 1)=E (Me Mrou-i| -1) 


= Tp 34-1, E(Mi F.1)= Ursa My 


since {M,,} be a martingale. Hence, 


k-1 
E(M,|F,4)= Leng Mea +S E(M; Mey) 
j=0 


In a similar way, we can derive 


k-2 


1-2) =Uervea} Mea + ¥ E(Mj Iep_y) 


j=0 


E( 


And continue this process until we get 
E(M,| PF j=mM 
and finally 
E[E(M,|F,)] = E(M,) = E(M,) 
5.80. Verify the Optional Stopping Theorem. 


Consider the stopping times 7;, = min{7, n}. Note that 


M,= My + Myles g. — Mulersn (5.218) 


Hence, 


E(M,) — BU.) — EOE.) — BOLL yn) (5.219) 


vd 


Since 7), is a bounded stopping time, by Eq. (5.217), we have 


E(M ,.) = E(M,) (5.220) 


and lim P(T > n) = 0, then if E(\M7\) < 0, (condition (1), Eq. (5.83) 
we have. lim ({M;. |r.) = 9. Thus, by condition (3), Eq. (5.85), we 
get. lim ( [M7 Jers) = 0. Hence, by Eqs. (5.219) and (5.220), we 
obtain 


E(M,) = E(M,) 


5.81. Let two gamblers, A and B, initially have a dollars and 6 dollars, 
respectively. Suppose that at each round of tossing a fair coin A wins 
one dollar from B if “heads” comes up, and gives one dollar to B if 
“tails” comes up. The game continues until either A or B runs out of 
money. 


(a) What is the probability that when the game ends, A has all the 
cash? 


(6) What is the expected duration of the game? 


(a) Let X), X5, ... be the sequence of play-by-play increments in A’s 
fortune; thus, X; = #1 according to whether ith toss is “heads” or 
“tails.” The total change in A’s fortune after n plays is S = > Ke 
The game continues until time 7 where T= min{n: s, = <a or +b}. 
It is easily seen that 7 is a stopping time with respect to F,, = o(X), 
X>, ...,X,) and {S,,} is a martingale with respect to F’,. (See Prob. 
5.68.) Thus, by the Optional Stopping Theorem, for each n < oo 


0 = E(S)) = E(S 


mints, ny) 


=—aP(T =nand Sp =—a) +h PS anand Sp =f) + FCS, fep=ny) 


As n — ©, the probability of the event {7'>n} converges to zero. 
Since S,, must be between —a and b on the event {77> n}, it follows 


that E(S,, [prs n,) converges to zero as n — oo. Thus, letting n > 


oo, we obtain 
—aP(S,, = —a) + bP(S,. =h)=0 (S221) 
Since S'; must be —a or b, we have 
P(S, = —a) + P(S, = b) =1 (3,222) 


Solving Eqs. (5.221) and (5.222) for P(S; = —a) and P(S;= b), we 
obtain (cf. Prob. 5.43) 


a Pa. i- — (5.223) 
ath ath 


Thus, the probability that when the game ends, A has all the cash 
is a/(a+ b). 
(b) It is seen that {5,7 — 1} is a martingale (see Prob. 5.72, 0” = 1). 


Then the Optional Stopping Theorem implies that, for each n = 1, 
Dy ise 


E(Seincrny)  min(Tn))=0 (5.224) 
Thus, 


+h (3,2 fi ap | (5 225) 


Limintt 0) =£ (Spier oy} = [Sr dire 


Now, as n > 0, min(T, n) > T and S,? Liten, > S,?, and 
lim Elmin(T, n)] = E(T) 


9 >) 9 a 9 ad 
lim £[ S;* J;-—.., |} =E(S; )=a> + b* =ab 
ne ( " ire) ( r) | at bh ( a a 


\ 


Since S*,, is bounded on the event {7'>n}, and since the 


probability of this event converges to zero as n — 0, E(S*,] (T> ny) 
— 0asn— o. Thus, as n > «, Eq. (5.225) reduces to 


E(T) = ab (5.226) 


5.82. Let X(t) be a Poisson process with rate 1 > 0. Show that x(f) — At is a 
martingale. 


We have 
E(|X(t) — At|) = ELX()] + At = 2dr < 00 
since X(t) => 0 and by Eq. (5.56), ELX(d)] = At. 


E[X(t) — Mt| F] = E[X(s) — At + X(O — X(s) [FI 
= E|X(s) — At|F.| + EIX() — X(9)| FI 
= X(s) — At + E[X(1)— X(s)] 
= X(s) — At + Ai — s) = X(s) — As 


Thus, x(t) — At is a martingale. 


SUPPLEMENTARY PROBLEMS 


5.83. Consider a random process X(n) = {X,,, 1 = 1}, where 


X =Z,+Z+--+Z 


n 


and Z,, are iid r.v.’s with zero mean and variance o”. Is X(n) stationary? 


5.84. Consider a random process X(t) defined by 
X(t) = Ycos(at + O) 


where Y and © are independent r.v.’s and are uniformly distributed 
over (—A, A) and (—2, 2), respectively. 
(a) Find the mean of X(Z). 


(b) Find the autocorrelation function R(t, s) of X(t). 


5.85. Suppose that a random process X(t) is wide-sense stationary with 
autocorrelation 


Rt, t+ t)= eT lI? 


(a) Find the second moment of the r.v. _X(5). 
(b) Find the second moment of the r.v. X(5) — _X(3). 


5.86. Consider a random process X(t) defined by 
X(t) = Ucost+(V+1)sint —2 <t< so 
where U and V are independent r.v.’s for which 
EU) = E(V) =0 E(U?) = E(V?) = | 


(a) Find the autocovariance function Ky(¢, s) of X(t). 
(b) Is X(t) WSS? 
5.87. Consider the random processes 
X(f) = A, cos(w,t + O) ¥(t) = A, cos(w,f + D) 
where Ao, A), Mo, and @, are constants, and r.v.’s © and ® are 


independent and uniformly distributed over (—z, z). 
(a) Find the cross-correlation function of Ryy(t, t+ 7) of X(t) and Y(0). 


(b) Repeat (a) if © = ®. 
5.88. Given a Markov chain {X,, n = 0}, find the joint pmf 


PO, 


[oe 


5.89. Let {X,, 1 = 0} be a homogeneous Markov chain. Show that 


mi ‘y! |My 
5.90. Verify Eq. (5.37). 


5.91. Find P” for the following transition probability matrices: 


? § 100 1 ao oO 
@ P50 ge] Geto) WrsO 7 0 
. io Oo ft 0.3 02 O05 


5.92. A certain product is made by two companies, A and B, that control the 
entire market. Currently, A and B have 60 percent and 40 percent, 
respectively, of the total market. Each year, A loses $ of its market 


share to B, while B loses 4 of its share to A. Find the relative 


proportion of the market that each hold after 2 years. 


5.93. Consider a Markov chain with state {0, 1, 2} and transition probability 


matrix 
9 i 1 
2 2 
1 
palo Oe 
2 2 
1 O O 


Is state 0 periodic? 
5.94. Verify Eq. (5.51). 


5.95. Consider a Markov chain with transition probability matrix 


p. 


06: U2 O02 
P=|04 05 0O.1 
06 0 O04 


Find the steady-state probabilities. 
5.96. Let X(2) be a Poisson process with rate 2. Find E[X?(t)]. 


5.97. Let X(f) be a Poisson process with rate 1. Find E{[X(#) — X(s)°} for t> 
S. 


5.98. Let X(t) be a Poisson process with rate 1. Find 
P[X(tt—d)=k|X®=j] d>0 


5.99. Let T,, denote the time of the nth event of a Poisson process with rate 
A. Find the variance of T,,. 


5.100. Assume that customers arrive at a bank in accordance with a Poisson 
process with rate A = 3 per hour, and suppose that — customer is a 
man with probability = and a woman with probability 4 . Now suppose 


that 10 men arrived in ie first 2 hours. How many woman would you 
expect to have arrived in the first 2 hours? 


p. 


5.101. Let X,..., X,, be jointly normal r.v.’s. Let 


L=X.7¢ t=1,...,n 


where c; are constants. Show that Yj, ..., Y,, are also jointly normal 


rv.’S. 
5.102. Derive Eq. (5.63). 


5.103. Let X,, X>, ... be a sequence of Bernoulli r.v.’s in Prob. 5.70. Let M, = 
S,,— n(2p — 1). Show that {//,} is a martingale. 


l 


5.104. Let X, X>, ... be 1.1.d. rv.’s where X; can take only two values Ss and ~ 


with equal probability. Let Mj = 1 and M = i X,. Show that {M,, n= 
[= 


0} is a martingale. 


n Sn 
5.105. Consider {X,,} of Prob. 5.70 and S,, = S X;. Let i= 4) . Show 
i=1 Pp 


that {Y,,} is a martingale. 
5.106. Let X(t) be a Wiener’s process (or Brownian motion). Show that 


{X(t)} is a martingale. 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


5.83. No. 


5.84. (a) ELX(t)] = 0; 
(b) R(t, s) = 7A? COs w(t — Ss) 


5.85. (a) E[X7(5)] = 1; 
(b) E{[X(5) — XB)P} = 20 - 7!) 


5.86. (a) Ky(t, s) = cos(s — 2); 
(b) No. 


5.87. (a) Ryt,t +1] =0 
Ao A\ 
9 


~ 


(D) Ryy (tt +T)= 


cos[(a, — Wt + o,T] 


5.88. Hint: Use Eq. (5.32). 
Pig) Pipi Pijir sae Pian 
5.89. Use the Markov property (5.27) and the homogeneity property. 


5.90. Hint: Write Eq. (5.39) in terms of components. 
5.91 pr=[) ° +(0.5)" me 
91. (a) ; i ; fe 1 


1 0 0 
(b) P"=|0 1 0 
001 


I 
(c) P"=|0 1 Of+@5)"} 0 0 O 
0.6 04 0 -0.6 -04 1 


5.92. A has 43.3 percent and B has 56.7 percent. 
5.93. Hint: Draw the state transition diagram. 
No. 


5.94. Hint: Let N = [N,,] where N,, is the number of times the state k(€ B) 
is occupied until absorption takes place when X(n) starts in 
state /(€ B). Then T; = y a N is calculate E(N;,). 


a fs 2 2 
aie 


5.96. t+ 1° 
5.97. Hint: Use the independent stationary increments condition and the 


result of Prob. 5.76. 
A(t — s) + A2(t -— s)* 


k jrk 
59g __J' | t7d| {a 
kKj—k!\ t t 


5.99, n/12 
5.100. 4 
5.101. Hint: See Prob. 5.60. 


5.102. Hint: Use condition (1) of a Wiener process and Eq. (5.102) of Prob. 
= ew 


5.103. Hint: Note that M/, is the random number S,, minus its expected 
value. 


5.104. Hint: Use definition 5.7.1. 


CHAPTER 6 


Analysis and Processing of Random Processes 


6.1 Introduction 


In this chapter, we introduce the methods for analysis and processing of random 
processes. First, we introduce the definitions of stochastic continuity, stochastic 
derivatives, and stochastic integrals of random processes. Next, the notion of 
power spectral density is introduced. This concept enables us to study wide-sense 
stationary processes in the frequency domain and define a white noise process. The 
response of linear systems to random processes is then studied. Finally, orthogonal 
and spectral representations of random processes are presented. 


6.2 Continuity, Differentiation, Integration 


In this section, we shall consider only the continuous-time random processes. 


A. Stochastic Continuity: 


A random process X(t) is said to be continuous in mean square or mean square 
(m.s.) continuous if 


lim BA, Xr be) X(DOP}=0 (6.1) 


eu 


The random process X(f) is m.s. continuous if and only if its autocorrelation 
function is continuous at ¢ = s (Prob. 6.1). If X(4) 1s WSS, then it is m.s. continuous 
if and only if its autocorrelation function R(t) is continuous at t = 0. If X(4) is m.s. 


continuous, then its mean is continuous; that 1s, 


lim Uy (tf | €)= y(t) (6.2) 
Ee 


which can be written as 


lim £[ X(t +e) = E[lim X(t + ¢)] (6.3) 
eB >» ge >0 rent | 


Hence, if X(Z) is m.s. continuous, then we may interchange the ordering of the 
operations of expectation and limiting. Note that m.s. continuity of X(t) does not 
imply that the sample functions of X(f) are continuous. For instance, the Poisson 
process is m.s. continuous (Prob. 6.46), but sample functions of the Poisson 
process have a countably infinite number of discontinuities (see Fig. 5-3). 


B. Stochastic Derivatives: 


A random process X(f) is said to have a m.s. derivative X"(t) if 


bast 


c—0 & 


Xitte)- XO 
5 SAE ress (6.4) 


where l.i.m. denotes /imit in the mean (square); that is, 


be 


lim £4 xO] | 0 (6.5) 


Xu +e)— Xr) _ 
15 £ 


The m.s. derivative of X(t) exists if 0*Ry (t, s)/Ot Os exists at t= s (Prob. 6.6). If 


X(t) has the m.s. derivative X’(f), then its mean and autocorrelation function are 
given by 


AL X (| =— AVX) =n) (6.6) 
dt 
4? Ry (t. 8) a 
Ry (t, 5) = pees) (6.7) 
ot as 


Equation (6.6) indicates that the operations of differentiation and expectation may 
be interchanged. If X(¢) is a normal random process for which the m.s. derivative 
X'(t) exists, then X"(f) is also a normal random process (Prob. 6.10). 


C. Stochastic Integrals: 


A m.s. integral of a random process X(f) is defined by 


rin= im X( a) da — vi st . > X(t;) At, (6.8) 


where fy <t, <... <tand At; = t4, — ¢. 


The m.s. integral of X(¢) exists if the following integral exists (Prob. 6.11): 
ff. 8a. By dadp (6.9) 
fo % 


This implies that if X(f) is m.s. continuous, then its m.s. integral Y(t) exists (see 
Prob. 6.1). The mean and the autocorrelation function of Y(#) are given by 


iy t)= F| f : X(e) de =f" F[X(a)|da= f ity, (et) ede (6.10) 
0 ff i) 


Rits=Ell xteida [° XB dp 
rhs elf X(e0 da f° xepoag| 
rps eg (6.1L) 
=f [exw x(pildp da= f i Ryle, Bld pda 

dy Iky ig #f ty 


Equation (6.10) indicates that the operations of integration and expectation may be 
interchanged. If X(¢) is a normal random process, then its integral Y(f) is also a 
normal random process. This follows from the fact that 2, X(¢,) At; is a linear 


combination of the jointly normal r.v.’s. (see Prob. 5.60). 


6.3 Power Spectral Densities 


In this section we assume that all random processes are WSS. 


A. Autocorrelation Functions: 


The autocorrelation function of a continuous-time random process X(f) is defined 
as [Eq. (5.7)] 


R(t) = E[XOXC + 7)] (6.12) 
Properties of Ry(t): 


|. Ryl-t) — Rett (6.151 
2 [RAC] SR) if. 1) 


3. RUS BX 7 0 ih.15i 
Property 3 [Eq. (6.15)] is easily obtained by setting t = 0 in Eq. (6.12). If we 
assume that X(f) is a voltage waveform across a 1-Q resistor, then ELX7()] is the 
average value of power delivered to the 1-Q resistor by X(t). Thus, E[X7(A)] is often 
called the average power of X(t). Properties 1 and 2 are verified in Prob. 6.13. 


In case of a discrete-time random process X(n), the autocorrelation function of 
X(n) is defined by 


Rk) = E[X()X(an + 4) (6.16) 
Various properties of R\(k) similar to those of R(t) can be obtained by replacing t 


by k in Eqs. (6.13) to (6.15). 


B. Cross-Correlation Functions: 


The cross-correlation function of two continuous-time jointly WSS random 
processes X(f) and Y(t) is defined by 


Ry (a) = ELX(DYG + I (6.17) 


Properties of Ryy(t): 


Bae tartiat fabs (ASS 
2 [Ratt | = VR WIR, ith 1,19 
Ba ale : [iO + Ri ie 


These properties are verified in Prob. 6.14. Two processes X(f) and Y(t) are called 
(mutually) orthogonal if 


Ry y(t) = 0 forall t (6.21) 


Similarly, the cross-correlation function of two discrete-time jointly WSS random 
processes X(n) and Y(n) is defined by 


Ryy& = EXMY(n + # (6.22) 


and various properties of Ryy(x) similar to those of Ryyt) can be obtained by 
replacing t by k in Eqs. (6.18) to (6.20). 


C. Power Spectral Density: 


The power spectral density (or power spectrum) S\(@) of a continuous-time 
random process X(f) is defined as the Fourier transform of R(t): 


a ie oe 2 ; ae) —jwr aii 
Sy (er) if bx(te dt (6.23) 
Thus, taking the inverse Fourier transform of S\(@), we obtain 
1 y ’ = jure , 
Ry (1) = — f Sy(mjer dw (6.24) 
2~ © 


Equations (6.23) and (6.24) are known as the Wiener-Khinchin relations. 


Properties of Sy(): 
|. Stevi as reul and Svan 0, (A281 
2. SA owl Suh (b.264 
4 l Ma 
4 AA URS — fo Selene (27) 
HH * 


Similarly, the power spectral density S,(Q) of a discrete-time random process X(n) 
is defined as the Fourier transform of Ry{k): 


Thus, taking the inverse Fourier transform of S\({Q), we obtain 


Ry (k= fl Sx@Qre™ do (6.29) 
ee : 


am 


Properties of Sy(Q): 


1, Ay? — 2} ~ Aelt2h (Boshi 


2, $402) is veal and §,(62) 20 ib Ali 

3. A{-ebh = Ay) iti) 

, naa ie lps at rue 

4 Ey) RA PS ae (6.331 
a & 


Note that property 1 [Eq. (6.30)] follows from the fact that e/ is periodic with 
period 27. Hence, it is sufficient to define S\(Q) only in the range (—1, 71). 


D. Cross Power Spectral Densities: 


The cross power spectral density (or cross power spectrum) Sy{@) of two 
continuous-time random processes X(t) and Y(f) is defined as the Fourier transform 


of Ry tT): 
. a jor 2, ee 
Syy (w) = f 5 Ryy (Tt Ye dt (6.34) 
Thus, taking the inverse Fourier transform of Sy(@), we get 


Ryy (Tv) age fSyy (we? - dw (6.35) 


Properties of Syy(o): 


Unlike S\(@), which is a real-valued function of @, Sy), in general, is a 
complex-valued function. 


|. Stel = 8, el (S.a61 
2. AL Ace = 8) (eal (Rah 


Similarly, the cross power spectral density Sy(Q) of two discrete-time random 
processes X(n) and Y(n) is defined as the Fourier transform of Ry(k): 


Sxy (82) w Ry (De (6.38) 


k--x 


Thus, taking the inverse Fourier transform of Sy(Q), we get 
Ryy (kK) = re f- ONY (Q)e! dQ (6.39) 
alt ir 


Properties of Syy(Q): 


Unlike S\(Q), which is a real-valued function of o, SyQ), in general, is a 
complex-valued function. 


|. §,,(@ + 27) = 5, (8: (aati 

2, S60 = 80 2) idl 

3. (My = SE i ina) 
6.4 White Noise 


A continuous-time white noise process, W(t), is a WSS zero-mean continuous-time 
random process whose autocorrelation function is given by 


R, (4) = 078(7) (6.43) 
where 6(T) is a unit impulse function (or Dirac 6 function) defined by 


fo) 9@) dr = 000) dh 


where p(t) is any function continuous at t = 0. Taking the Fourier transform of Eq. 
(6.43), we obtain 


Sy(w) =o? f * dre!" dr=o? (6.45) 
© 


which indicates that X(t) has a constant power spectral density (hence the name 
white noise). Note that the average power of W(?) is not finite. 

Similarly, a WSS zero-mean discrete-time random process W(n) is called a 
discrete-time white noise if its autocorrelation function is given by 


Rylk) = 07 d(k) (6.46) 


where 6() is a unit impulse sequence (or unit sample sequence) defined by 


| fl) k=0 
B19 be (6.47) 


Taking the Fourier transform of Eq. (6.46), we obtain 

Sy(Q) = 07 ) d(kye = 7? mila (6.48) 
Again the power spectral density of W(n) is a constant. Note that Syj{Q + 27) = 
SyAQ) and the average power of W(n) is o* = Var[W(n)], which is finite. 


6.5 Response of Linear Systems to Random Inputs 


A. Linear Systems: 


A system is a mathematical model of a physical process that relates the input (or 
excitation) signal x to the output (or response) signal y. Then the system is viewed 
as a transformation (or mapping) of x into y. This transformation is represented by 
the operator T as (Fig. 6-1) 


x y 
— 
Fig. 6-1 
y= Tx (6.49) 


If x and y are continuous-time signals, then the system is called a continuous-time 
system, and if x and y are discrete-time signals, then the system is called a discrete- 
time system. If the operator T is a linear operator satisfying 


Tix, +x} = Tx, + Tx, =y,+y,  (Additivity) 


T{ax} = aTx = ay (Homogeneity) 
where a is a scalar number, then the system represented by T is called a linear 


system. A system is called time-invariant if a time shift in the input signal causes 
the same time shift in the output signal. Thus, for a continuous-time system, 


TOE 1) Pe yt 4) 
for any value of f), and for a discrete-time system, 
TO ny); = ya =n) 


for any integer np. For a continuous-time linear time-invariant (LTI) system, Eq. 
(6.49) can be expressed as 


way= fo hA — 2) dA (6.50) 


where 
A(t) = Tio} (6.51) 


is known as the impulse response of a continuous-time LTI system. The right-hand 
side of Eq. (6.50) is commonly called the convolution integral of h(t) and x(0), 
denoted by A(f) * x(t). For a discrete-time LTI system, Eq. (6.49) can be expressed 
as 


y(n) = S Ai )x(n — i) (6.52) 
i7~-x 
where 


hin) = T{d(n)} (6.53) 


is known as the impulse response (or unit sample response) of a discrete-time LTI 
system. The right-hand side of Eq. (6.52) is commonly called the convolution sum 
of h(n) and x(n), denoted by h(n) * x(n). 


B. Response of a Continuous-Time Linear System to Random Input: 


When the input to a continuous-time linear system represented by Eq. (6.49) is a 
random process {X(f), t € 7\.}, then the output will also be a random process { Y(A), 


te Lys that is, 


TKX).6ET} = 1H, 1ET} (6.54) 


For any input sample function x,(¢), the corresponding output sample function is 
¥() = Tix, } (6.55) 
If the system is LTI, then by Eq. (6.50), we can write 
Y()= f a WA)X(t — AV dA (6,56) 
Note that Eq. (6.56) is a stochastic integral. Then 
EIYWI= f° MAELXU AAA (6.57) 
The autocorrelation function of Y(¢) is given by (Prob. 6.24) 
Ry(0— ff bode PRy G — a, 5— Bde dp (6.58) 
If the input X(4) is WSS, then from Eq. (6.57), 
ELY(I= by fo MA)dA=py HO) (6.59) 


where H(0) = H(@)|,, = 9 and H(@) 1s the frequency response of the system defined 
by the Fourier transform of h(£); that is, 


H(w)= f pe nye!" di (6.60) 
The autocorrelation function of Y(t) is, from Eq. (6.58), 
RLS fo FP MOM ARs —1-a- pda dé (6.61) 
Setting s = t+ T, we get 
RULE TSA J a [WeaAryr + fyder dB = Ry it) (6.62) 
From Eqs. (6.59) and (6.62), we see that the output Y(¢) is also WSS. Taking the 
Fourier transform of Eq. (6.62), the power spectral density of Y(£) is given by 


(Prob. 6.25) 


Sy (eo) — : R.(rye “" dt =|H (ey! Sy (a) (6.63 
— 1 ! ‘ ) 


Thus, we obtain the important result that the power spectral density of the output is 
the product of the power spectral density of the input and the magnitude squared of 
the frequency response of the system. 

When the autocorrelation function of the output R(t) is desired, it is easier to 
determine the power spectral density S,(@) and then evaluate the inverse Fourier 
transform (Prob. 6.26). Thus, 


be BP scccreary Lop te it 
Reiti=_ [Spam dio- [| Aen Syme! deo (6.04) 
Pe Bee gene Das Beek oes 


aa! 


By Eq. (6.15), the average power in the output Y(f) is 


E(Y* (| = Ry (0 es f . | A (ea) *Sy(w) ie (6.65) 
Ze © 


C. Response of a Discrete-Time Linear System to Random Input: 
When the input to a discrete-time LTI system is a discrete-time random process 
X(n), then by Eq. (6.52), the output Y(n) is 

¥iny— SA) X(n—1) (6.66) 
The autocorrelation function of Y(n) is given by 


Ry (4, Ht) = S ¥ AG WHO Ry Gt — 7, mt — 2) (6.67) 


=—w j=—a 


When X(n) is WSS, then from Eq. (6.66), 


E[Yinyl— uy SY bli) — wy HCO) (6,68) 


i=e-n 


where H(0) = A(Q)|g9 =. and H(Q) is the frequency response of the system defined 
by the Fourier transform of h(n): 


H(Q)- s hinje 72" (6.69) 


n=-™ 


The autocorrelation function of Y(n) is, from Eq. (6.67), 
Ry (HL, a1) = > py HOMOR yn —n+i-t) (6.70) 
j=-v; J=—a 
Setting m=n-+k, we get 


Re(aatky— SS hhRyk -i-1)— Reh) (6.71) 


i=—-x f=-x 


From Eqs. (6.68) and (6.71), we see that the output Y(n) is also WSS. Taking the 
Fourier transform of Eq. (6.71), the power spectral density of Y(n) is given by 
(Prob. 6.28) 


$,(2) = |H@Q)|? $@ (6.72) 


which is the same as Eq. (6.63). 
6.6 Fourier Series and Karhunen-Loéve Expansions 


A. Stochastic Periodicity: 


A continuous-time random process X(f) 1s said to be m.s. periodic with period T if 
E{X(t + T) — X@]?} — 0 (6.73) 


If X(t) is WSS, then X(A) is m.s. periodic if and only if its autocorrelation function 
is periodic with period 7; that is, 


R,(t + T) = R,(t) (6.74) 


B. Fourier Series: 


Let X(t) be a WSS random process with periodic R(t) having period 7. Expanding 
R(t) into a Fourier series, we obtain 


Ry (T)= Ba cet On, = Qa it (6 75) 


A——-* 


where 
1 _; 
¢, == fl Rye ae (6.76) 
Let X(t) be expressed as 


KtiO= > Xx ev hy — 2 iT (6.77) 


w-™* 


where X,, are r.v.’s given by 
r , ince , 
X= a 5 X(the OW dt (6.78) 


Note that, in general, X,, are complex-valued r.v.’s. For complex-valued r.v.’s, the 


correlation between two r.v.’s X and Y is defined by E(XY*). Then X(f) is called the 
m.s. Fourier series of X() such that (Prob. 6.34) 


E{| X() — X|7} = 0 (6.79) 


Furthermore, we have (Prob. 6.33) 


. wes [ey “=O gs 
E(X,,)= ux, d(a) = lo Ea (6.80) 
fh? 
< + 2 Os t= HTT 
E(X,, X,,)—¢,0(n mt) — 0 nae (6.81) 


C. Karhunen-Loéve Expansion 


Consider a random process X(t) which is not periodic. Let Xf) be expressed as 


X= > X,9,04) O<f<T (6.82) 


nl 


where a set of functions {@,(f)} 1s orthonormal on an interval (0, 7) such that 


f . Gn (t) Pm (t) dt — 5(n — m) (6.83) 


and_X,, are r.v.’s given by 
4, = J gp On (E) at (6.84) 
Then X (f) is called the Karhunen-Loéve expansion of X(t) such that (Prob. 6.38) 


EX| XW) — XW) |*} = 0 (6.85) 


Let R(t s) be the autocorrelation function of X(4), and consider the following 
integral equation: 


i ae ; ; a 
I Ry (1 3)0, (9) ds =A, O,(0) O<nt<7 (6.86) 


where 4, and @,(¢) are called the eigenvalues and the corresponding eigenfunctions 
of the integral equation (6.86). It is known from the theory of integral equations 
that if Ry(¢, s) is continuous, then @,(¢) of Eq. (6.86) are orthonormal as in Eq. 


(6.83), and they satisfy the following identity: 


Ry (ts) SY Andy 09, (9) (6.87) 


atl 


which is known as Mercer 5 theorem. 
With the above results, we can show that Eq. (6.85) is satisfied and the 
coefficient X,, are orthogonal r.v.’s (Prob. 6.37); that is, 


a : [4,. =m 
Fi xX, X m ) =a A, OF —m)= |o 


nt mM 
6.7 Fourier Transform of Random Processes 


A. Continuous-Time Random Processes: 


The Fourier transform of a continuous-time random process X(f) is a random 
process X(q) given by 


X(w)= fo Xe dt (6.89) 


which is the stochastic integral, and the integral is interpreted as a m.s. limit; that 
iS, 


; 


Note that X(o) is a complex random process. Similarly, the inverse Fourier 
transform 


X(a@)— ff e xine" dt | =() (6.90) 


| Ye & zs 
X@)=— f X(@)e" dw (6.91) 
nae 


is also a stochastic integral and should also be interpreted in the m.s. sense. The 
properties of continuous-time Fourier transforms (Appendix B) also hold for 
random processes (or random signals). For instance, if Y(t) is the output of a 
continuous-time LTI system with input X(d), then 


¥(o) = Xi (Ac) (6.92) 


where H(q) is the frequency response of the system. 
Let Ro i «,) be the two-dimensional Fourier transform of Ry(¢, s); that is, 


Rx(w,0;)— eed f 7 Ry (tse et at ds (6,93) 
Then the autocorrelation function of X(q) is given by (Prob. 6.41) 
Rel, w,) — EX.) XX, V1 — Reo, -09,) (6.94) 


If X(2) is real, then 


b1X(@ ) X(w,)| _ Ry {o,, (,) (6.95) 
X(—o) = X*(@) (6.96) 
Rf-o,, —,) = R* (a, (,) (6.97) 


If X(2) is a WSS random process with autocorrelation function R(t, s) = Ry(t— s) = 
R,(t) and power spectral density S\(@), then (Prob. 6.42) 


R lw), @) — 2a8.(w,) d(w, + w,) (6,98) 


Ril, (9) = 278,.(a) iT) (3, — «,) (6.99) 


Equation (6.99) shows that the Fourier transform of a WSS random process is 
nonstationary white noise. 


B. Discrete-Time Random Processes: 


The Fourier transform of a discrete-time random process X(7) is a random process 
X(Q) given by (in m.s. sense) 


X(Q)= y X(njei™ (6.100) 


fie & 


Similarly, the inverse Fourier transform 


l x= ‘0 
re a im € fect s 
Xin)=— fo X(Q)e™ dO (6.101) 


att 


should also be interpreted in the m.s. sense. Note that XQ) + 2m) = XQ) and 
the properties of discrete-time Fourier transforms (Appendix B) also hold for 
discrete-time random signals. For instance, if Y() is the output of a discrete-time 
LTI system with input X(v), then 


¥(Q) = X(Q)H(Q) (6.102) 


where H(Q) is the frequency response of the system. 
Let RQ |» &2,) be the two-dimensional Fourier transform of Ry(n, m): 


Ry (2.25) S S Ry lame 2 Sem (6.103) 


le Ts | Haas 


Then the autocorrelation function of XQ) is given by (Prob. 6.44) 
RA(Q,, A.) — EIX(Q,) FQ) — RY, -Q,) (6.104) 


If X(n) is a WSS random process with autocorrelation function Ry(n, m) = Ryn — 
m) = R,{k) and power spectral density S\(Q), then 


R(Q,. Q,) = 2nS,(Q,)d (Q, + Q,) (6.105) 


Ry(Q,, Q,) — 278,(Q,)5(Q, — Q) (6.106) 


Equation (6.106) shows that the Fourier transform of a discrete-time WSS random 


process is nonstationary white noise. 


SOLVED PROBLEMS 


Continuity, Differentiation, Integration 


6.1. Show that the random process X(f) is m.s. continuous if and only if its 
autocorrelation function Ry(¢, s) is continuous. 


We can write 


Fi[Nir oe) MCPS - NAG oe) Nr XO | X4in] 


=Kilttef+ oI + e+ R09 (6.107) 


Thus, if Ry(¢, s) is continuous, then 


lim E{|X(t + €)— X(9P} = lim {Ry(t + 6,¢ + ©) — 2Rylt tet) + Ry(t. H}=0 
ed eQ 3 az 


and X(t) is m.s. continuous. Next, consider 
Ret €,t+ &)— ROO = EX + €,) — XCOUAE + ©) — XO} 
— BAIN + 8B) — ACIAT ENE — ERIACE + 8) — XCOTATOT 


Applying Cauchy-Schwarz inequality (3.97) (Prob. 3.35), we obtain 
Rit e,.t—2)— Rie) = BME + 01 — MP ENG + 21 — XP 
— (EATXG — 8) — KU? RET” CN! + GEQTA + #53) — MOT METS? (Ty 
Thus, if X(¢) is m.s. continuous, then by Eq. (6.1) we have 


€1.€9 70 


that is, Ry{¢, s) is continuous. This completes the proof. 


6.2. 


6.3. 


6.4. 


Show that a WSS random process X(f) is m.s. continuous if and only if its 
autocorrelation function R(t) is continuous at t = 0. 


If X(t) is WSS, then Eq. (6.107) becomes 


FXG oc) XQVE) RAO R, Lei] (6.1118) 
Thus if R(t) is continuous at t = 0, that is, 

lim [Ry (¢) — Ry (O}] —O 

r—Q  * 


then lim EUXG +e)- XU) } =O 
f—- 


that is, X(¢) is m.s. continuous. Similarly, we can show that if X(¢) is m.s. 
continuous, then by Eq. (6.108), R,(t) is continuous at t = 0. 


Show that if X(£) is m.s. continuous, then its mean 1s continuous; that is, 


lim Uy (t + €) = Uy (t) 
e>0 


We have 
Var[X(t + €) — X(f| = E{LX(r + ©) — XP} — {ELX(r + 2) — X(N] =O 
Thus, 
EXLX(t + &€) — X(OP} = {ELX(e + ©) — XO = [wy (0+ ©) — By (OP 


If X(¢) is m.s. continuous, then as ¢ — 0, the left-hand side of the above 
expression approaches zero. Thus, 


lim [y(t + €) — uy (| =0 or lim [iy(t + €)= Uy(t) 
6 t= 
Show that the Wiener process _X(f) is m.s. continuous. 


From Eq. (5.64), the autocorrelation function of the Wiener process X(f) is 
given by 


R,(t, s) = 07 min(t, s) 


Thus, we have 


EE CPR = oF eS ee nie ey Se ie Ee 


6.5. 


6.6. 


Since 


lim max(€,, €,) =0 
£1,970 


R(t, s) is continuous. Hence, the Wiener process X(f) is m.s. continuous. 


Show that every m.s. continuous random process is continuous in 
probability. 


A random process X(f) is continuous in probability if, for every t and a > 0 
(see Prob. 5.62), 


lim P{| X(t +e)— X(t)|>a}=0 
e>0 
Applying Chebyshev inequality (2.116) (Prob. 2.39), we have 


*] 


P{|xX(tt+e)-xX@|>a}= E|| X@ +a) XE) 


a 


Now, if X(£) is m.s. continuous, then the right-hand side goes to 0 as € — 0, 
which implies that the left-hand side must also go to 0 as ¢ — 0. Thus, we 
have proved that if X() is m.s. continuous, then it is also continuous in 
probability. 


Show that a random process _X(é) has a m.s. derivative X'(t) if O°Ry(t, s)/Ot As 
exists ats =¢. 


Let 


X(tt+e)—Xit 
ieee ee (6.109) 
€ 


By the Cauchy criterion (see the note at the end of this solution), the m.s. 
derivative X’(f) exists if 


itty FAY (t: €,)—¥ Cts e,)) } =0 (6.110) 
New ee SEER RE RE PA 
AP ek TAR atts at] APA | (hl 14 
and 
1 


LV (Ps ey RAE €) | — 


LULA les} XAY[AU |e} AGH} 
') fo 


1 


fe. 


~ 


1 [Ay Ue ft jo RyG try) Ryle ted ayl] 
é- Ey 


Thus, 


a Ry (t 5 5} 


= R, (O,112) 
ar as \ 


t= 


lim of [Viti en Wits e)]- 


ey: —~ 


provided 67R y(t, s)/Ot As exists at s = ¢. Setting e; =e, in Eq. (6.112), we get 


lim E[Y?(t;¢,)] = lim E[Y*(; €,)] = R, 
£0 £0 


and by Eq. (6.111), we obtain 


lin £{[¥ere,j)-Fir; €)F}=R, -2R, +R, =0 (6,113) 
€ .€2 70 
Thus, we conclude that _X(#) has a m.s. derivative X'(t) if 0°R y(t, s)/Ot Os 


exists at s = ¢. If X(t) is WSS, then the above conclusion is equivalent to the 
existence of 6°R,(t)/ 0*t at t= 0. 


Note: In real analysis, a function g(¢) of some parameter € converges to a 
finite value if 


lim  [8(é2)— g(é)]=0 


&|,€9 > 
This is known as the Cauchy criterion. 


6.7. Suppose a random process X(f) has a m.s. derivative X’(2). 
(a) Find ELX’(d)]. 
(b) Find the cross-correlation function of X(¢) and_X’(£). 
(c) Find the autocorrelation function of X’(¢). 


(a) We have 
E[X’()]-—F 


Li,.m.— 


roll = 


Xit =| 


AGtei— Xi 
= tim aes: i 
e—~ & 
ye xt be) yO _ 
hin vt 4% 
pe »i £ Hy(t) (6,114) 


(b) From Eq. (6.17), the cross-correlation function of X(t) and_X’(f) is 


Ryy (f.8) = ELXUIN (|= [x0 iim et 


ret E 


E| X(tX0s +e)] — E] X(OX8)] 


= lim 
» vil Ee 
. Aeffistep- Relisys) GRY Ls) 

os Nilay PE BEE AY oat (6.115) 
ell 3 as 


(c) Using Eq. (6.115), the autocorrelation function of X’(£) is 


X(t lei X(0) 


et 


Ryi.s) = LUX X41 y}| 2} List, xi 
=i AY Xe | ext FL XG X"Cs}| 


ull c 


— Ryylt +e,5)— Ry y(t.) 
ih —— 
e-~U Ee 


ARyy(t.s) PP Rylt.s) 
of ards 


(6.116) 


6.8. If X(t) is a WSS random process and has a m.s. derivative X’(t), then show 


that 
ld] Resi) = eR (6,117) 
iit 
ee 
ib} Ry-itj=-—R, (7) (6.1183 


dr 
(a) Fora WSS process X(t), Ry(t, s) = Ry(s — ft). Thus, setting s — t= in Eq. 
(6.115) of Prob. 6.7, we obtain ORs — 1)/0s = dRy(t)/dt and 


dRy(t 


(b) Now ORs — f/6t = —dRy(t)/dt. Thus, &*R(s — t)/ Ot Os = PWRyt)/d ?, 
and by Eq. (6.116) of Prob. 6.7, we have 


2 


d~ 
dt 


6.9. Show that the Wiener process X(t) does not have a m.s. derivative. 


From Eq. (5.64), the autocorrelation function of the Wiener process X(f) is 
given by 
sf t>s 


Je 
Ry(t,s)=o*% min(t, 5) = > 
lo t t<s 


Thus, 


r 2 ~. 
|o to>s 


(6.119) 
[0 t<o 8 


a } F 
—Ryf Hout —s) = 
as 


where u(t — s) is a unit step function defined by 


1 t>s 
HE) t<s 


and it is not continuous at s = ¢ (Fig. 6-2). Thus, 6? Ry{t, s)/Ot As does not 
exist at s = ¢, and the Wiener process X(t) does not have a m.s. derivative. 


u(t—s) 


0 Ss t 
Fig. 6-2 Shifted unit step function. 
Note that although a m.s. derivative does not exist for the Wiener process, 


we can define a generalized derivative of the Wiener process (see Prob. 
6.20). 


6.10. Show that if X(t) is a normal random process for which the m.s. derivative X’ 
(t) exists, then _X’(f) is also a normal random process. 


Let X() be a normal random process. Now consider 


y (p= Xt) = XO 


Then, 1 r.v.’s Y,(t)), Y(t), ---, Y,(¢,) are given by a linear transformation of 
the jointly normal r.v.’s X(¢,), X(t, + €), X(b), X(b + 8), .-., X(t), X(t, + €). It 
then follows by the result of Prob. 5.60 that Y,(¢,), Y.(), ..., Y,(t,,) are 


jointly normal r.v.’s, and hence Y,(¢) is a normal random process. Thus, we 
conclude that the m.s. derivative X’(¢), which is the limit of Y,(4) as « — 0, is 


also a normal random process, since m.s. convergence implies convergence 
in probability (see Prob. 6.5). 


6.11. Show that the m.s. integral of a random process X(f) exists if the following 
integral exists: 


f J, x(a, B) da dp 


to Jt 
A m.s. integral of X(t) is defined by [Eq. (6.8)] 


if 
Y(t)={ X(a)da=l.i.m. ) X(t;) At; 
of, ae aa 
Again using the Cauchy criterion, the m.s. integral Y(t) of X(4) exists if 


lime | 
l 


Ar, Am +d 


¥ x(t) Af. -S XH, Tal =. {6.1 20) 


EB 


As in the case of the m.s. derivative [Eq. (6.111)], expanding the square, we 
obtain 


7 
F| 2 Aut) At; - y X(t) Ak i 
L 


= FS 2% PYG Va, Ar, a ZCI) M, Ate 3 2reneti yan AML, 
SS Rho lat, My YS Ret thal An DNS Ryltp tg) Ay At 
tok fog Log 
and Eq. (6.120) holds if 


li Ry(t,,t,) At, At 
ep x (tj.t,) At; At, 


exists, or, equivalently, 


t t 
J, Si, Rx(@ B) da dB 
exists. 


6.12. Let_X(¢) be the Wiener process with parameter 07. Let 
t 
Y(t)= f x(a) da 


(a) Find the mean and the variance of Y(Z). 
(6) Find the autocorrelation function of Y(¢). 


(a) By assumption 3 of the Wiener process (Sec. 5.7), that is, ELX(£)] = 0, 
we have 


EIY(Ol= él f Xia) da| = [ £1X(w)de =0 (6.121) 
Then 
Varl¥ (N= EY) = fy J, ELX(@X(B)] de dp 
=f J, Rx(a.B)da dp 


By Eq. (5.64), Ry(a, B) = 07 min(a, B); thus, referring to Fig. 6-3, we 
obtain 


Var[¥ (ry = af. ye min(a, PB) dea dp 
=o" fap fia data” i, 


(b) Let t>s => 0 and write 


(6.122) 


Y(t)= J, X(a) dea + f'1X(@)- X(s)|\dat+(t —s)X(s) 


=¥(s)+ f'[X(@)— Xs) da t+ = s)X(s) 


Then, for ¢>s > 0, 
Ry (f. 5) = ETF ts ]] 


= FUYF (01 fo EU Mled — XO} dar + (0 — GEL XIE ey) 


Now by Eq. (6.122), 


o7s? 


E(Y?(s)] = Var[Y (s)] = 


Using assumptions 1, 3, and 4 of the Wiener process (Sec. 5.7), and since 
s<a<t, we have 


is FA| X(ay— X(sj[¥(s)} dee = ij {LX (r)— XL, X(A) dpi t tee 
-f" . h , ELL Xe) X(9LX(B)— XO} dB dea 
ss i: fi ; ELX(e) — X(SJELX(5)— XO) dB de —0 
Finally, forO<B<s, 
(t — sJELX(8)¥(s)] =(f —s) i ELX(sJX(P) dp 
ts) f Re(s, PED o* min(s, B) dp 


— iF et Bdp-o Mts) 5 


Substituting these results into Eq. (6.123), we get 


J 


a 
A Bi 
Ry(t, a (8) = ae 


Since Rit, s) = Rys, 2), we obtain 


1 
—O 24231 — 3) f>y>=O0 
% 


Ry.) = 
ot? (35 se) acepenD 
6 


Power Spectral Density 
6.13. Verify Eqs. (6.13) and (6.14). 
From Eq. (6.12), 
R(t) = ELXOX(t + 7)] 


Setting ¢+ T=, we get 


R(T) = ELX(s — 4)X(s)] = ELX(S)X(s — t)] = R,(-7) 


Next, we have 
E{[X(t) + X(t + D2} =0 


Expanding the square, we have 


2 <3 ¢3¢ =4§)) 


(6.124) 


EIX-( | 2XXo + 794+ X20 + reo 


or HX (A) 2 2ELXOOXEF | oO] | ELK 
Thus, 2R, (0) + 2K, (7 = 0 


from which we obtain Eq. (6.14); that is, 
Ry (0) = | Ry(t) | 


6.14. Verify Eqs. (6.18) to (6.20). 


t}] = 0 


By Eq. (6.17), 
Ryy(-T) = EIXOYt- 1] 
Setting ¢— T= s, we get 
Ry(—T) = ELX(s + 1)Y(5)] = ELMS) X(s + 1)] = Ry y(7) 


Next, from the Cauchy-Schwarz inequality, Eq. (3.97) (Prob. 3.35), it 
follows that 


CFIXWYU + OY = FIVE LY + 2] 
or [Rr ye = ROR CO} 


from which we obtain Eq. (6.19); that is, 


| Ry AM |= VR, (OR, (O) 
Now XG) Yt | DE} = ( 


Expanding the square, we have 


PCO) IXOHYtn 1 cr} VA APO 
or E[X¢tr}| =—ZE (AGE T}] i EU ¥4(r cs ri =f 
Thus, RO) _ 2K, 4 R.(9) > ( 


from which we obtain Eq. (6.20); that is, 
Ry y(t) = 51R,(0) + Ry(0)] 
6.15. Two random processes X(t) and Y(t) are given by 
X(t) = A cos(mt + O) Y(t) = A sin(wt + @) 


where A and @ are constants and © is a uniform r.v. over (0, 27). Find the 
cross-correlation function of X(f) and Y(t) and verify Eq. (6.18). 


From Eq. (6.17), the cross-correlation function of X(t) and Y(f) is 


6.16. 


Ryy 7.2 — T= LEXY (7 + 2h] 
=i A* costo +@)sin[ ot! +t) +O]} 


A’ ® 5 ' 
= Efsin(2ar + wr + 20) —sin(—wr)] 


% 


——sinat -— Ry, (tT) (6,125) 


Similarly, 
Ryy (i,t — Tt) = E[Y(Xe + rh] 


- E{A sin(oa + O)cos[ait +7) + O]} 
poe P ; 
ar El sin(2 aot + wr + 20) + sint—wr}] 


te 


= sin ct — Ry (0) (6.126) 
-_ 


From Eqs. (6.125) and (6.126), we see that 


2 a 


Ryy(—T) = x sin @a(— t) =— sin wot = Ryy (tT) 


= — 


which verifies Eq. (6.18). 


Show that the power spectrum of a (real) random process X(f) is real and 
verify Eq. (6.26). 


From Eq. (6.23) and expanding the exponential, we have 
Sy(@) — f . Ry(te 7" dt 
= a Ry (rT) (cos — jf sin ade 


as P Z ra Pox s 
- —_ Ry (tj coswmer dt — j ies Ry(t) sina dr (6.127) 


Since Ry(— Tt) = Ry{t), Ry(T) cos wt is an even function of t and R(t) sin wt 


is an odd function of t, and hence the imaginary term in Eq. (6.127) vanishes 
and we obtain 


Sy{@)= fo Re) cos wt dt (6,128) 
which indicates that S,(@) is real. Since cos(—@t) = cos(@t), it follows that 
Sy(—@) = S,(@) 


which indicates that the power spectrum of a real random process X(f) is an 
even function of frequency. 


6.17. Consider the random process 
Y() = (-1)% 


where X(f) is a Poisson process with rate 2. Thus, Y(f) starts at Y(0) = 1 and 
switches back and forth from +1 to —1 at random Poisson times 7;, as shown 
in Fig. 6-4. The process Y(f) is known as the semirandom telegraph signal 
because its initial value Y(O) = 1 is not random. 


Fig. 6-4 Semirandom telegraph signal. 


(a) Find the mean of Y(¢). 
(b) Find the autocorrelation function of Y(£). 


(a) We have 


1 if X(t) is even 
Y(t)= : : 
1 if X(t) 1s odd 


Thus, using Eq. (5.55), we have 


PLY (0) =1] = PLX(£) = even integer] 


Oy. 


ae 


=e =e cosh At 


PLY (t) = — 1] = P[X(t) = odd integer] 
Cay 
3! 


ee e™ sinh AL 


=e [i . 


Hence, 
“yt — EPPO) — (LPT) — 1] + (-L PT yt) -— -1) 
—e “(cosh dr sinh ari—e (6.129) 
(b) Similarly, since Y(t)Y(t + t) = 1 if there are an even number of events in 


(¢t, +7) for t>0 and Y(t)Y(t + t) = —1 1f there are an odd number of 
events, then for ¢>0 andt+t> 0, 


Ry(t,t+ T= LYOYC+7)] 


oe re ae Ait _ar (At)" 
— 1) e Ar \ + (— 1) e AT / 
: > n} | p> n} 


x 3 \n 
et (—AT) =—g ita =¢ 7M 


! 
n=0 a 


which indicates that R(t, t+ t) = R(t), and by Eq. (6.13), 
R(t) = e717 (6.130) 
Note that since E[ Y(4)] is not a constant, Y(f) is not WSS. 
6.18. Consider the random process 


Z(t) = AY(N) 


6.19. 


where Y(f) is the semirandom telegraph signal of Prob. 6.17 and A is ar.v. 
independent of Y(¢) and takes on the values #1 with equal probability. The 
process Z(t) is known as the random telegraph signal. 


(a) Show that Z(t) is WSS. 
(b) Find the power spectral density of Z(t). 


(a) Since E(A) = 0 and E(A’) = 1, the mean of Z(A) is 
uf) — E[ZG)] — ECAETY()] — 0 (6.131) 
and the autocorrelation of Z(f) is 


R(t, £— 1) = E[A* YOY + Dl = EA?) AYOYG — DL =RyGtt+ 2 


Thus, using Eq. (6.130), we obtain 


R,@t+9=R,@)=e 741 (6.132) 


Thus, we see that Z(t) is WSS. 


(b) Taking the Fourier transform of Eq. (6.132) (see Appendix B), we see 
that the power spectrum of Z(f) is given by 


4A 


S (a) =. 
a w- +4A° 


(6.133) 


Let X(t) and Y(t) be both zero-mean and WSS random processes. Consider 
the random process Z(t) defined by 


Z(t) = XH + YO 


(a) Determine the autocorrelation function and the power spectral density of 
Z(t), (1) if X(2) and Y(#) are jointly WSS; (11) if X(4) and Y(2) are 
orthogonal. 

(b) Show that if X(f) and Y(¢) are orthogonal, then the mean square of Z(f) is 
equal to the sum of the mean squares of X(f) and Y(). 


(a) The autocorrelation of Z(¢) is given by 


R(t, 8) = E|Z(OZ6s) | = EALX(O + YCOILXG) + ¥(s)]} 
= ELX(f)X(s)| + ELX(NY(s)| + ELY()X(s)] + ELY(OY(s)] 
=R,(t,.8) + Ry lS) PRA) F Ry) 


(i) If X(¢) and Y(A) are jointly WSS, then we have 
R(t) = Ry(t) + Ryy(D) + Ryy(D + RY 


where t = s — t. Taking the Fourier transform of the above expression, we 
obtain 


S,(@) = Sy(a) + Syy(@) + Syy(w) + Sy (a) 
(ii) If X(¢) and Y(t) are orthogonal [Eq. (6.21)], 

Ryy(t) = Ryy() = 0 
Then 


R(t) = R(t) + Ry(t) (6.134a) 


S(@) = S,(a) + $,(@) (6.134b) 
(b) Setting t = 0 in Eq. (6.134a), and using Eq. (6.15), we get 
E(Z? (t)) = E[X*(O)] + ELY*()] 


which indicates that the mean square of Z(t) is equal to the sum of the 
mean squares of X(t) and Y(f). 


White Noise 


6.20. Using the notion of generalized derivative, show that the generalized 
derivative X’(t) of the Wiener process X(f) is a white noise. 


From Eq. (5.64), 
R,(t, s) = 07 min(t, s) 


and from Eq. (6.119) (Prob. 6.9), we have 


6.21. 


 —e Ts 
ee oe —4) (6.135) 
. 


Now, using the 6 function, the generalized derivative of a unit step function 
u(t) is given by 


d — 
- u(t) = d(t) 


Applying the above relation to Eq. (6.135), we obtain 


wed 


7 4 Pe 
— Ry ff, s)=a° — u(t — 4) =a dt — 4) (6.136) 
Of tly of 


which is, by Eq. (6.116) (Prob. 6.7), the autocorrelation function of the 
generalized derivative X"(t) of the Wiener process X(4); that is, 


R(t, 8) = o76(t 8) = 0?6(0) (6.137) 


where t =¢— s. Thus, by definition (6.43), we see that the generalized 
derivative X’(t) of the Wiener process X(t) is a white noise. 


Recall that the Wiener process is a normal process and its derivative is 
also normal (see Prob. 6.10). Hence, the generalized derivative X’(t) of the 
Wiener process is called white normal (or white Gaussian) noise. 

Let X(t) be a Poisson process with rate A. Let 
Y(t) = X(t) — At 
Show that the generalized derivative Y’(t) of Y(t) is a white noise. 
Since Y(t) = X(t) — At, we have formally 
YoO=X (oH) -—A (6.138) 
Then 


FLY“) = ELX®) — Aj = ELX() — A (6.139) 


6.22. 


Rydt.s) = ALYY | = EXO — AIX (9) — 21} 
= EX (X'(s) — AX) — AX') — AP] 
— ELX(OQN'(s)] — AELX'(9)] — ABLN OY + 2? 


Now, from Eqs. (5.56) and (5.60), we have 
E[X(t)] = At 
R(t, s) = Amin(t, s) + Ats 
Thus, 
FLX'(O] = A and ELX’(s)] =A 
and from Eqs. (6.7) and (6.137), 


o Ry(t.¥) 


FLX COX (sl) — Ry ts 
or ds 


—Aadte shi 2° 
Substituting Eq. (6.141) into Eq. (6.139), we obtain 
ELY'()] =0 


Substituting Eqs. (6.141) and (6.142) into Eq. (6.140), we get 
Ry, x) = Ad(t — 5) 


Hence, we see that Y’(f) is a zero-mean WSS random process, and by 


(6.140) 


(6.141) 


(6.142) 


(6.143) 


(6.144) 


definition (6.43), Y'(f) is a white noise with o” = i. The process Y'(A) is 


known as the Poisson white noise. 


Let X(t) be a white normal noise. Let 
Y= ff : X(a) da 


(a) Find the autocorrelation function of Y(£). 
(b) Show that Y(t) is the Wiener process. 


(a) From Eq. (6.137) of Prob. 6.20, 


R,(t, 5) = 07 6(t — s) 
Thus, by Eq. (6.11), the autocorrelation function of Y(f) is 
Ry (t.s)= jel Ry (a. p) dap det 
- f. [rsa Pydadp 
= o [att — pf) dp 


>» pmingr. tt : ) : < i are 
=<" f ap =a* mint, ) (6,145) 


(b) Comparing Eq. (6.145) and Eq. (5.64), we see that Y(t) has the same 
autocorrelation function as the Wiener process. In addition, Y(7) is 
normal, since X(f) is a normal process and Y(0) = 0. Thus, we conclude 
that Y(Z) is the Wiener process. 


6.23. Let Y(n) = X(n) + Wn), where X(n) = A (for all 1) and A is ar.v. with zero 
mean and variance oie and W(n) is a discrete-time white noise with average 


power 0°. It is also assumed that X(n) and W(n) are independent. 
(a) Show that Y(n) is WSS. 
(b) Find the power spectral density SQ) of Y(n). 


(a) The mean of Y(n) is 
EL ¥(a)| = E|X(n)| + ELE] W(n)| = EA) + £|W(n)| = 0 
The autocorrelation function of Y(n) is 
Rat, + i) = LAGE) + vi || Xin + hy — Win + 


= Al ¥inhMin — Ai] — B| Siege eee bea] +R ie | aie +e] + A] nie +} 
= MAT + Hy (Ks or) Har dikh = Ah) rin ess) 


Thus, Y(”) is WSS. 
(6) Taking the Fourier transform of Eq. (6.146), we obtain 


§,(Q) — 2a, 78(Q) +o” =—TreiQd<x (6.147) 


Response of Linear Systems to Random Inputs 


6.24. Derive Eq. (6.58). 


Using Eq. (6.56), we have 
Ry (t. 8) = ELY(OY(s)] 
= ELS”. h(a) X(t —a)da fo WB)X(s — By dB 


= fof A(@yn( BEL X(t — @)X(s — BY dae dp 
= [oS MON BRC a,» — B) da dp 
6.25. Derive Eq. (6.63). 
From Eq. (6.62), we have 
Ry (1) ={_f. h(a)h(B)Ry(t +a — B) da dp 
Taking the Fourier transform of R(t), we obtain 
Sy (a) = th Ry (rye dr = f. Ta Sa hice h( P)Ry (e+ — Be do. dB dr 
Letting t+ a—B =A, we get 


Syw)= fff A@ABRy Aye PO“ da dp dd 


= i h(ajel™ dc Ts A( pyer jap apf”, Ry (2) e TOAD) 
= H(—w)H (w)Sy(@) 
= H *(w)H (w)Sy(w) =| H(@) Sy(w) 


6.26. A WSS random process X(¢) with autocorrelation function 
RD = elt! 


where a is a real positive constant, is applied to the input of an LTI system 
with impulse response 


6.27. 


h(t) = e u(t) 


where b is a real positive constant. Find the autocorrelation function of the 
output Y(t) of the system. 


The frequency response H(@) of the system is 


| 
jo+b 


H(@) = F[h(t)] = 
The power spectral density of X(A) is 
2 


i _ a 
Sy(@) = FL Ry (t)| = ees 


By Eq. (6.63), the power spectral density of Y(£) is 


_ - ; 
2 2 
w +a 


_ a 2h _ b 2a | 
(a° +b \b\a? +b?) (a? —b*)yb\@* +a 


Sy (@) = | H(w) is Sy (a) = | 


5) 5] 
ww + b* 


Taking the inverse Fourier transform of both sides of the above equation, we 
obtain 


| : -b| | —al |. 
Ry (t) =—.~——— (ae —be ) 
. (a* —b* yb 


Verify Eq. (6.25), that is, the power spectral density of any WSS process X(f) 
is real and S,(@) = 0. 


The realness of S\{@) was shown in Prob. 6.16. Consider an ideal bandpass 
filter with frequency response (Fig. 6-5) 


—%, —O.- 0 @, ® 


Fig. 6-5 


we 1 a,<|o|<o, 
(60) — 
) 0 otherwise 


with a random process X(f) as its input. 


From Eq. (6.63), it follows that the power spectral density Sy(w) of the 
output Y(t) equals 


Sy(@) a, <|a|<a, 
0 otherwise 


Sy(@) = 
Hence, from Eq. (6.27), we have 


x l 60 ; ] 
E[Y -(t)] =— Sy (@) dm = 2 
[Y“(r)] on i ine y (at A 5 


a if ? Sy(w) do = 0 
ey * 


which indicates that the area of S\(@) in any interval of o is nonnegative. 
This is possible only if S\(w) = 0 for every o. 


6.28. Verify Eq. (6.72). 


From Eq. (6.71), we have 


R= YY AOMDRy(k+i- 


i=—c [=—0o 


Taking the Fourier transform of R(x‘), we obtain 


a ) Ry(kye* at zs x MiAORy(K Li ie 


k-=—2 


Letting k+i-—/]=n, we get 


Sy(Q) = eS s > AC )A(DRy (nye 220-7 


nm=—c p=—a f=—w 


~ 2 h(ijet 2 h(Lye > Ry (ne 


i=-2 {=—2 n=—2 


= H(-Q)H(Q)S,(Q) 
= H*(Q)H(Q)S(Q) =| H(Q)|’ SQ) 


6.29. The discrete-time system shown in Fig. 6-6 consists of one unit delay 
element and one scalar multiplier (a < 1). The input X(7) is discrete-time 
white noise with average power o”. Find the spectral density and average 


power of the output Y(7). 


¥(n) 


Fig. 6-6 


From Fig. 6-6, Y(n) and_X(n) are related by 
Yt) — a¥in — 1) — X(n) (6.148) 
The impulse response h(n) of the system is defined by 


h(n) — ah(n — 1) + d(a) (6.149) 


Solving Eq. (6.149), we obtain 
h(n) = a" u(7) (6.150) 


where u(n) is the unit step sequence defined by 


1 n=O 
WN geet 


Taking the Fourier transform of Eq. (6.150), we obtain 


ie 4 3 | 
H(Q)= py ave J = 


n=0 


— a<1,[Q|<a 
—ae /* 


Now, by Eq. (6.48), 
S,(Q) = o? |\Q| <a 
and by Eq. (6.72), the power spectral density of Y(n) is 


Sy(Q2) =| H(Q) ‘ Sy(Q)= H(Q)VA @2)Sy-(82) 


’ 


7 


(ae 1 — ae!) 


oa . 
2. |Q| <2 (6.151) 
l+a> — 2a cosQ , 


Taking the inverse Fourier transform of Eq. (6.151), we obtain 


2 
OAK 


Ry (k) = a 
" 1—a’ 


Thus, by Eq. (6.33), the average power of Y(7) is 


2 
O 


1a" 


E{Y?(n)| = Ry (0) = 


6.30. Let Y(4) be the output of an LTI system with impulse response A(t), when 
X(t) is applied as input. Show that 


tal Re =f AGIA, G8 Bhd (A152 
if Re [iste £ Mal Ry: (la, vidis (6.1531 


(a) Using Eq. (6.56), we have 
Ryy(t, 8) = EIX(OY (= E)x@) f-AB)X(s — B) ap 
= [7 ABEIXOX(s— Pl dB = J" A(B)Ry(t,» — B) dB 
(b) Similarly, 


Ry (t,.8) = ELY (OY (s)] = elf" A(oj)X(f-ajdaY(s | 


=f" Mc@E[X@— a) ¥(s)da =f" h(a)Ry (t — a, 8) dex 


6.31. Let Y(4) be the output of an LTI system with impulse response A(t) when a 
WSS random process X(f) is applied as input. Show that 
Udi) Aya (ene) + Ela tet 46, | sti 
CP) Sool LPR GAN feat (4.155 
(a) If X(t) is WSS, then Eq. (6.152) of Prob. 6.30 becomes 


Ryy (t.8)— [ A(B)Ry(s —1 — B)dB (6,156) 


which indicates that Ry¢, s) is a function of the time difference t= 5 — ¢ 
only. Hence, 


Ryy(t) — [7 MPIRy(r — By dp (6.157) 
Taking the Fourier transform of Eq. (6.157), we obtain 
Syy (a) = fo Ryy (ede = ff" A B)Ry(t - Pye "dp de 
= ff AB Ry Aye OP dp da 


=f" WBF dp Ry(Aje* dA = H(w)Sy(w) 
_ BY Ry x 


(b) Similarly, if X(t) is WSS, then by Eq. (6.156), Eq. (6.153) becomes 


Ry (t,8)= ff A(a@)Ryy (s —t +a) da 


which indicates that Ry(7, s) is a function of the time difference t= s — t 
only. Hence, 


Rye) — J Ma Ray ( —ajda (6,158) 


Taking the Fourier transform of R)(t), we obtain 


Sy(w) = J" Rye de = FP A Ry (t+ eM" da dr 
= ff WOR gy (Ajo 1 der dd. 
=f" hla) eda "Ry (We Ida 
= H(—w)Syy(@) = H* (w)Syy (@) 
Note that from Eqs. (6.154) and (6.155), we obtain Eq. (6.63); that is, 


S (a) = H* (w)Sy)(@) = H* (w)H(@)S,(w) = | H(w) |*5,(@) 


6.32. Consider a WSS process X(¢) with autocorrelation function R,{t) and power 
spectral density S\(@). Let X’(4) = dX(t)/dt. Show that 


lei Henrys Rin) (4.159% 
dt 

(HY Ry ith= R(T) (6.100) 
de” 

Wr) Ny(iul =a Sp(ied (Lali 


(a) If X(d) is the input to a differentiator, then its output is Y(t) = X(t). The 
frequency response of a differentiator is known as H(w) = j@. Then from 
Eq. (6.154), 

Syy (Ww) = H(w)S,(w) = jwS,(w) 


Taking the inverse Fourier transform of both sides, we obtain 


Ryy:(T) = “ Ry(t) 


(b) From Eq. (6.155), 


S,.(m) = H*(@)S,,.(@) = —jos,,(@) 
Again taking the inverse Fourier transform of both sides and using the 
result of part (a), we have 


2 


d 
Ry (t)=— ae Ryy(t) =— 7 Ry(T) 


7 
(c) From Eq. (6.63), 
S,(@) = | H(@) |?S,(@) = | jw |*S,(@) = w75,(@) 


Note that Eqs. (6.159) and (6.160) were proved in Prob. 6.8 by a 
different method. 


Fourier Series and Karhunen-Loéve Expansions 
6.33. Verify Eqs. (6.80) and (6.81). 

From Eq. (6.78), 

xX, = i f J Xe" dt wy = 2K/T 
T 40 
Since X(t) is WSS, ELX(0)] = Ly, and we have 
por y= + f xe” at 
T 40 
= My ; f ‘ MO alt = wy (2) 
Again using Eq. (6.78), we have 


; * T £ fo) pJNMos 7, 
EL,X j= E Xia Jo X*(s)e as 


= L. pt eV] pls 16 
ely E[X,, X* (s)]e ds 


Now 


E[X,, X*(s)]= e|t iS X(ne I! dt X*(s) 


ad 


1 Dr : — jun 
al E[X(1)X*(s)le 2" dt 
. = Jo Ret-sye "at 
Letting t— s = 1, and using Eq. (6.76), we obtain 
] {f ans re 
rhY ¥#(e\p— ; ym fan (4 
ELX,X*(8)]~ J, Re(ve dt 
a if T Ry lye SMF der) HMO — gy Han (6.162) 


Thus, 


n- in r 


ae a ae 
E(X,Xin) = J, ce TO eI™05 ds 
I —j(n—m)wos 
=c, = if of © ds =c,6(n — m) 


6.34. Let Xt) be the Fourier series representation of X(t) shown in Eq. (6.77). 
Verify Eq. (6.79). 


From Eq. (6.77), we have 


1) 


a3 . 
Ni) = SX 


u——*. 


r| X(t) — X(t) PI =F 


z 


= bx)? ] 5 BUX! X¢rpjew he! 


j=— 2 


Xu) - s Xx, 


a= @ 


i= % 


T= vs iad | 


FP extents FF ax xger oe 


es aA=—-> mS w 


Now, by Eqs. (6.81) and (6.162), we have 


E[X,X(t)]=c, eo" 
E[X,X*()|=c,e 7 
E(X,X,,)=c,(n —m) 


nom 


Using these results, finally we obtain 


a 


E|| X(t)- xo} =R,0) - ¥ 


n= 


oe oe 
Cy ~ » Cy a: » Cn =a 
7 


= n= 


since each sum above equals R,{0) [see Eq. (6.75)]. 


6.35. Let X(t) be m.s. periodic and represented by the Fourier series [Eq. (6.77)] 


X(t) = > Xe"! ww =22/T, 


n=—-s2 


Show that 
BI|XO) =F IX) (6.163) 
n——% 
From Eq. (6.81), we have 
RC |X |*). — AA X8) — 6 (6.164) 


Setting t = 0 in Eq. (6.75), we obtain 


EI|X@|"1=RxO@ = Yi c,= ¥ BX, |”) 


n=—% n=—% 
Equation (6.163) is known as Parseval’s theorem for the Fourier series. 


6.36. If a random process X(f) is represented by a Karhunen-Loéve expansion [Eq. 
(6.82)] 


6.37. 


X= 5 X90)  O<1<T 


n=] 


and_X,,’s are orthogonal, show that 0,,(t) must satisfy integral equation (6.86); 
that is, 


J. Rxt.s)6,(s)ds=A,9,() OST <T 


Consider 
XOX, = ») XnXnGm(C) 
m=! 
Then 
FLX()X,]- y F(X Xn IO lt) — ke] X,, [ p02) (6,165) 


m=L 


since X,,’s are orthogonal; that is, E(X,,X* ) = 0if mn. But by Eq. (6.84), 


+ ” 7 
ARIE [xo X*(s9g,,(5) ds| 
7 lin E[ X(t) X* (shy (s) bs 


= i Ry (t, 8p, (8) ds (6.166) 
Thus, equating Eqs. (6.165) and (6.166), we obtain 
i. Ry (t,5)o, (8) ds = E (|X, F 9, (t) = A,@, (1) 
where A, = E(|X,). 


Let X(t) be the Karhunen-Loéve expansion of X(t) shown in Eq. (6.82). 
Verify Eq. (6.88). 


From Eqs. (6.166) and (6.86), we have 


EXO XI [RCE 3) Gq (8948 — Ag O) (6.167) 
Now by Eqs. (6.83), (6.84), and (6.167) we obtain 
EIX,Xy)—E| J Xaonce)at Xj |— f) ELXOX, (rae 


T a T * 
_ f | Dag Pint) eB) At — Ay i. dy, A) 9, (8) at 


A, Om ay A, OLR a) (6.168) 


6.38. Let X(t) be the Karhunen-Loéve expansion of X(t) shown in Eq. (6.82). 
Verify Eq. (6.85). 


From Eq. (6.82), we have 


a) 


- 


X(t)— >) XQa(t) : 


tt— l 
X*(t) S X prt | 


El| xr) - @)| J = | 


-*| 


= 2X) 1- FXO 
CF Sd | 


bg 


X(t) > XPn ll) 


a=! 


v=! 


: Sa X*(r)X, 1d, (0) + S 3 E(X,X.,)6,(N6, (0) 


i=l a=1 in=1 


Using Eqs. (6.167) and (6.168), we have 


X= XO] I= RCO — FY And OG Y Anh (HOO + Y Ayal) 


A=! id | n=! 


El 
=O 


since by Mercer’s theorem [Eq. (6.87)] 


Ry(t.)= ¥ And, (OG, 0 


n=1 


and 


A, = E\X, |) = 4, 
6.39. Find the Karhunen-Loéve expansion of the Wiener process X(f). 
From Eq. (5.64), 


— [o’s s<i 
Ry(t,s)=o° min(/, s) = _ 
lor s>t 


Substituting the above expression into Eq. (6.86), we obtain 


s pl : A a 
oa i's mint, s),,(8) ds =A,0,(0) Qar<T (6.169) 
or 
of sGa(sdds +a7t f° dy(s) es — Ady OARS) 


Differentiating Eq. (6.170) with respect to t, we get 
eT ioe i ee 
of ts) ds = 2,6,(0) (6.171) 


Differentiating Eq. (6.171) with respect to ¢ again, we obtain 


+ —— byt) —0 (6.172) 


tk 
A general solution of Eq. (6.172) is 


o(t)=a sinw@t+b cos at wo =alV2 
fi h n ri fh A it 


In order to determine the values of a,, 5,, and i, (or @,,), we need 
appropriate boundary conditions. From Eq. (6.170), we see that ,(0) = 0. 
This implies that b, = 0. From Eq. (6.171), we see that ,,(7) = 0. This 
implies that 


| (6.173) 


The normalization requirement [Eq. (6.83)] implies that 


9 
T ; ; a, T 
nes (a, sinw,t) dt = a =l—>a,= 


Thus, the eigenfunctions are given by 
2 af 


pce Re I 
a sin|n 3 


=t One? (6.174) 
; vd 
and the Karhunen-Loéve expansion of the Wiener process X(f) is 
; [2 . f ] IT 
X(t) = — X, sin} a— | t Qc TS. (6.175 
Vr » au WEE (6.175) 


where X,, are given by 


2 fx : 1 MA 
i X(t) inn] 


and they are uncorrelated with variance 4,. 


6.40. Find the Karhunen-Loéve expansion of the white normal (or white Gaussian) 
noise W(t). 


From Eq. (6.43), 


Ry(t, s) = 0° d(t — s) 
Substituting the above expression into Eq. (6.86), we obtain 
0? [it —5)b,(s)ds=A,9,(t)  O<t<T 
or [by Eq. (6.44)] 
Op (1) = Ag (D (6.176) 


which indicates that all 1,, = 07 and @,(#) are arbitrary. Thus, any complete 
orthogonal set {9,(4)} with corresponding eigenvalues 4, = 6” can be used in 
the Karhunen-Loéve expansion of the white Gaussian noise. 


Fourier Transform of Random Processes 
6.41. Derive Eq. (6.94). 
From Eq. (6.89), 
X(@,) = | X(te 1" dt X(@)= J. X(s)e 1° ds 
Then 
Rz(@,, 0.) = ELX (ea, X * (ca, )) = E | f “ f 3 X(t)X * (sje OO) dp ds| 
= fof ELXOX Ase” dt ds 
= f we ioe Ry(t, se vt at ds — Ry (@,.— My) 
in view of Eq. (6.93). 
6.42. Derive Eqs. (6.98) and (6.99). 


Since X(f) is WSS, by Eq. (6.93), and letting t — s = 1, we have 


6.43. 


R,(@,,@) = im i Ry(t — se TO" dt ds 

= im Ry (te 2°" at i e Satords ge 
=S(w,)f “e7 lo ro2)5 ds 

From the Fourier transform pair (Appendix B) 1 < 2726(@), we have 

fie dt = 2050) 
Hence, 
R,(@,, @,) = 218, (@,)5(@, + 05) 
Next, from Eq. (6.94) and the above result, we obtain 


Rz(W,, W,) = Ry (wr, —@,) = 2Sy(w,)d(w, — 5) 


Let X(w) be the Fourier transform of a random process X(f). If X(o) isa 
white noise with zero mean and autocorrelation function g(@,) 6 (@; — @>), 


then show that X(t) is WSS with power spectral density g(@)/27. 
By Eq. (6.91), 
lpo = | 
X(th=— | X(@)e!” dw 
(N= > JX) 

Then 

ETX(tl— [21 Xwrle™ da — 0 

i 


Assuming that X(t) is a complex random process, we have 


(6.177) 


6.44. 


6.45. 


l 2» Saat a Wes Pree 
Kyitsh BLAGTN * Ls dl, 7) | Aiea A * (ety ate tte din) ith, 
ae eee : 
= eae ia fF FLX Cee 1X teas He de dy 
oy a 
| ah Te ee ; =e 
= = if f eile Cea, oy en dey, dea, 
co _—T — “a 


L Es ‘ muiy oot 
=— i) epee We 2 ctiey 
4m" 


which depends only on t— s = t. Hence, we conclude that X(t) is WSS. 


Setting t— s = t and @, = @ in Eq. (6.178), we have 


] x : 1 px 1 
R,(t)=—> w)e" do =— [at | e"" dw 
= 5 fq) = f-. a 
ai f° Sy(w)e!" dw 
2 


in view of Eq. (6.24). Thus, we obtain S\(@) = g(@)/27. 
Verify Eq. (6.104). 


By Eq. (6.100), 


X(Q, ) pea] bY X(n ye On X¥ 4 (Q, ) = S x *(m)e/@2™ 


n=—s max 


Then 


R(Q).Q,)= HX Q)X*(Q N= YY HXMX* mero 


m= Dm= « 


- y »y Ry(n, me J[2Qya—(-Qy ns] Ry (Q,, Q;) 


a=. SS 


in view of Eq. (6.103). 


Derive Eqs. (6.105) and (6.106). 


(A178) 


If X(n) is WSS, then Ryn, m) = Ry{n — m). By Eq. (6.103), and letting n — m 
= k, we have 


nx 


RxQ, : Q.,) = be >. Ry (n— mye ie + 2am) 


n=—-® ma—% 


sx @ 
_ PS Ry (ke JQyK > e J(Qy +Q5 im 


k=-—@ m=—oo 


= $y (Q)) ») eI +22)m 


ma=—@ 


From the Fourier transform pair (Appendix B) x(n) = 1 < 2726(Q), we have 


90 


Se mA +) = 278(Q, +O.) 
Hence, 
R, (Q,,Qy) = 278 y(Q,)5(Q, + Q3) 
Next, from Eq. (6.104) and the above result, we obtain 


Rz(Q,, Q,) = R,(Q,, —Q,) = 228, (Q,)6(Q, — Q,) 


SUPPLEMENTARY PROBLEMS 


6.46. Is the Poisson process X(f) m.s. continuous? 


6.47. Let X(t) be defined by (Prob. 5.4) 
X(t) = Ycos at t=0 


where Y is a uniform r.v. over (0, 1) and @ is a constant. 


(a) Is X(t) m.s. continuous? 
(b) Does X(t) have a m.s. derivative? 


6.48. Let Z(t) be the random telegraph signal of Prob. 6.18. 
(a) Is Z(t) m.s. continuous? 
(b) Does Z(t) have a m.s. derivative? 


6.49. Let X(t) be a WSS random process, and let _X’(t) be its m.s. derivative. 
Show that ELX(A)X(0)] = 0. 


6.50. Let 
Zt) = 2 ne d 
=—f,  X(@)da 


where X(f) is given by Prob. 6.47 with @ = 2n/T. 
(a) Find the mean of Z(f). 
(6) Find the autocorrelation function of Z(¢). 


6.51. Consider a WSS random process X(f) with ELX(t)] = py. Let 


1 p72 
(X(t); = 7 f “pp XW at 


The process X(f) is said to be ergodic in the mean if 
Lim.(X(0)); = ELXOI= bx 
Find E[ (X(t) 7. 


6.52. Let X(t) = A cos(@p ¢ + ©), where A and Wg are constants, © is a uniform 
r.v. over (—2, 1) (Prob. 5.20). Find the power spectral density of X(¢). 


6.53. A random process Y(¢) is defined by 


Y(t) = AX(t) cos(w,t + O) 


where A and @, are constants, © is a uniform r.v. over (—7, 1), and X(Z) is 


a zero-mean WSS random process with the autocorrelation function 
R(t) and the power spectral density S\(@). Furthermore, X(t) and © are 


independent. Show that Y(¢) is WSS, and find the power spectral density 
of Y(2). 


6.54. Consider a discrete-time random process defined by 


X(n)= S a; cos(Q;n + O;) 
i=l 
where a; and Q; are real constants and ©, are independent uniform r.v.’s 
over (—1, 7). 
(a) Find the mean of X(n). 
(6) Find the autocorrelation function of X(7). 


6.55. Consider a discrete-time WSS random process X(7) with the 
autocorrelation function 


Ry (k) = 10e~°* || 
Find the power spectral density of X(n). 


6.56. Let X(t) and Y(t) be defined by 


X(t) = Ucos at + Vsin at 


Y(t) = Vcos at — Usin w,t 


where @ is constant and U and V are independent r.v.’s both having zero 


mean and variance o?. 


(a) Find the cross-correlation function of X(f) and Y(?). 
(b) Find the cross power spectral density of X(t) and Y(¢). 


6.57. Verify Eqs. (6.36) and (6.37). 


6.58. Let Y(t) = X(t) + W(t), where X(t) and W(t) are orthogonal and W(ft) is a 
white noise specified by Eq. (6.43) or (6.45). Find the autocorrelation 
function of Y(£). 


6.59. A zero-mean WSS random process X(f) is called band-limited white noise 


if its spectral density is given by Find the autocorrelation function of X(2). 


_|No/2 |o| <a, 
s(o)={) |o|> wg 


6.60. A WSS random process X(f) is applied to the input of an LTI system with 


impulse response h(t) = 3e ~‘u(t). Find the mean value of Y(#) of the system if 
E(X(0)] = 2. 


6.61. The input X(¢) to the RC filter shown in Fig. 6-7 is a white noise specified 


by Eq. (6.45). Find the mean-square value of Y(f). 


x(t Y(t) 


6.62. The input X(¢) to a differentiator is the random telegraph signal of Prob. 
6.18. 


Fig. 6-7 RC filter. 


(a) Determine the power spectral density of the differentiator output. 
(b) Find the mean-square value of the differentiator output. 


6.63. Suppose that the input to the filter shown in Fig. 6-8 1s a white noise 


specified by Eq. (6.45). Find the power spectral density of Y(£). 


6.64. Verify Eq. (6.67). 


6.65. Suppose that the input to the discrete-time filter shown in Fig. 6-9 is a 


discrete-time white noise with average power o7. Find the power spectral 
density of Y(n). 


x(n) ¥(n) 


Fig. 6-9 


6.66. Using the Karhunen-Loéve expansion of the Wiener process, obtain the 
Karhunen-Loéve expansion of the white normal noise. 


6.67. Let Y(t) = X(t) + W(t), where X(t) and W(t) are orthogonal and W(?t) is a 
white noise specified by Eq. (6.43) or (6.45). Let 9,(4) be the eigenfunctions 
of the integral equation (6.86) and i, the corresponding eigenvalues. 


(a) Show that 9,(¢) are also the eigenfunctions of the integral equation for 
the Karhunen-Loéve expansion of Y(f) with RZ, s). 


(b) Find the corresponding eigenvalues. 


6.68. Suppose that 


X=) x, 


n 


where X,, are r.v.’s and @g is a constant. Find the Fourier transform of 
X(t). 


6.69. Let X(w) be the Fourier transform of a continuous-time random process 
X(t). Find the mean of X(q). 


6.70. Let 


X(Q)= y, X(nye I 


n=—-2 


where ELX(n)] = 0 and E[X(n)X(k)] = o,,° 6(n — k). Find the mean and the 
autocorrelation function of X(Q). 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


6.46. Hint: Use Eq. (5.60) and proceed as in Prob. 6.4. 
Yes. 


6.47. Hint: Use Eq. (5.87) of Prob. 5.12. 
(a) Yes; (b) yes. 


6.48. Hint: Use Eq. (6.132) of Prob. 6.18. 
(a) Yes; (b) no. 


6.49. Hint: Use Eqs. (6.13) [or (6.14)] and (6.117). 


(a). <= i sin cot 
6.50. = ; 
(b) R(t, s)= ; 


> sin wt sin ws 


6.51. 


6.52. 


6.53. 


6.54. 


G55, 


6.56. 


6.57. 


6.61. 


6.62. 


6.63. 


Hy 


2 
Sy(@) = = |d(@ — Wp) )+ d6(@ + @)I 


2 


Sy (@) = - [Sy(@—-o,.)+Sy(O+a,)] 


(a) El X(n)|=0 
(b) Renn b= Dat cos(Q; k) 
6.32 


Sy(Q) = ————_——___ - #<Q<2 
1.368 — 1.213 cos Q 


(a) Ryyt, t+ 1) =—o? sin Wot 


(b) Syf@) = jo7a[8 (@ — @p) — 5 + @p)] 


Hint: Substitute Eq. (6.18) into Eq. (6.34). 
Rt, s) = Rylt, s) + o5(t— s) 


No @p SIN WpTt 
Ry(t)= QO \“B B 
Wpt 


Hint: Use Eq. (6.59). 


Hint: Use Eqs. (6.64) and (6.65). 


o7/(2RC) 


4Aw* 
w” +4A7 
(b) E[Y’()]=% 


(a) Sy (w)= 


Sy) = 07(1 + a? + 2a cos wT) 


6.64. 


6.65. 


6.66. 


6.67. 


6.68. 


6.69. 


6.70. 


Hint: Proceed as in Prob. 6.24. 


SQ) = 62(1 + a* + 2a cos Q) 


Hint: Take the derivative of Eq. (6.175) of Prob. 6.39. 


where W,, are independent normal r.v.’s with the same variance o°. 


[2 - 1\a 
— 4 W,cos|n—-—|—t VSS 
72 " | ;|3 


n=1 


Hint: Use the result of Prob. 6.58. 


(b) 1, +0 


X(w) = 9.22X,5(w — nowy) 


n 


F [uy (Ol = a by (De dt where tly (t) = EL X(¢)] 


E[X(Q)|=0 


, i 2 —j( 2; —Q>))n 


n=—-x 


2 


CHAPTER 7 


Estimation Theory 


7.1 Introduction 


In this chapter, we present a classical estimation theory. There are two basic types 
of estimation problems. In the first type, we are interested in estimating the 
parameters of one or more r.v.’s, and in the second type, we are interested in 
estimating the value of an inaccessible r.v. Y in terms of the observation of an 
accessible r.v. X. 


7.2 Parameter Estimation 


Let X be ar.v. with pdf f(x) and_X), ..., X,, a set of n independent r.v.’s each with 
pdf f(x). The set of r.v.’s(Xj, ..., X,,) 1s called a random sample (or sample vector) 
of size n of X. Any real-valued function of a random sample s(X), ..., X,,) 1s called 
a Statistic. 


Let X be ar.v. with pdf f(x; 8) which depends on an unknown parameter 0. Let 
(Xj, ..., X,,) be a random sample of X. In this case, the joint pdf of Xj, ..., X,, 1s 


given by 
f(s @)=f (ays--sa3O)=T 1 (x59) (FN) 
i= 


where x}, ..., x, are the values of the observed data taken from the random sample. 


An estimator of 0 is any statistic s(Xj, ..., X,,), denoted as 


O = {X,, ....X,) (7.2) 


For a particular set of observations X, =x, ..., X, =x,, the value of the estimator 
S(X1, ...,X,) will be called an estimate of 0 and denoted by @: Thus, an estimator is 


ar.v. and an estimate is a particular realization of it. It is not necessary that an 
estimate of a parameter be one single value; instead, the estimate could be a range 
of values. Estimates which specify a single value are called point estimates, and 
estimates which specify a range of values are called interval estimates. 


7.3 Properties of Point Estimators 


A. Unbiased Estimators: 


An estimator © = s(X), ..., X,,) 1s said to be an unbiased estimator of the parameter 
0 if 
E(@) = 0 (7.3) 


for all possible values of 8. If © is an unbiased estimator, then its mean square 
error is given by 


E(@ — OY] — Ef — E(@)]} — Var) (7.4) 


That is, its mean square error equals its variance. 


B. Efficient Estimators: 


An estimator ©, is said to be a more efficient estimator of the parameter 0 than the 
estimator ©, if 


1. ©, and ©, are both unbiased estimators of 0. 
2. Var(@,) < Var(@,). 


The estimator Oj; = s(Xj, ..., X,,) 18 said to be a most efficient (or minimum 
variance) unbiased estimator of the parameter 0 if 


1. It is an unbiased estimator of 0. 
2. Var(Oyy) < Var(©) for all O. 


C. Consistent Estimators: 
The estimator ©, of 0 based on a random sample of size n is said to be consistent 


if for any small ¢ > 0, 


lim P(|@, —@)<e)=1 (7.5) 
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or equivalently, 


lim P(|@, —0|=e)=0 (7.6) 


n >00 


The following two conditions are sufficient to define consistency (Prob. 7.5): 
L. lun fio (Fi 
ua 7 eu 


2. lin Wirt) 0 (78 


7.4 Maximum-Likelihood Estimation 


Let f(x; 8) = f(x), ..., x,; 8) denote the joint pmf of the r.v.’s Xj, ..., X,, when they 
are discrete, and let it be their joint pdf when they are continuous. Let 


f(A) = f(x, #) = f,; tee As A) (7.9} 


Now L(0) represents the likelihood that the values x, ..., x,, will be observed when 


0 is the true value of the parameter. Thus, (8) is often referred to as the likelihood 
function of the random sample. Let 047, = s(x), ..., x,,) be the maximizing value of 


L(9); that is, 


E (Over ) = max L(@) (7.10) 
ff 


Then the maximum-likelihood estimator of 0 is 


By, Hi Kiy xg BD (7.11) 


and 0), 1s the maximum-likelihood estimate of 0. 


Since L(8) is a product of either pmf’s or pdf’s, it will always be positive (for 
the range of possible values of 0). Thus, In L(@) can always be defined, and in 
determining the maximizing value of 0, it is often useful to use the fact that L(0) 


and In L(0) have their maximum at the same value of 0. Hence, we may also obtain 
Oy, by maximizing In L(8). 


7.5 Bayes’ Estimation 


Suppose that the unknown parameter @ is considered to be ar.v. having some fixed 
distribution or prior pdf f(8). Then f(x; 8) is now viewed as a conditional pdf and 
written as f(x | 0), and we can express the joint pdf of the random sample (Xj, ..., 


X,,) and 8 as 
I Xesiccre Ray UPS iy ccs 10) f(O) (7.12) 
and the marginal pdf of the sample is given by 


f(s t= fp f(x).....%, O)d0 (7.13) 


/ 
§ : 


where Rg is the range of the possible value of 8. The other conditional pdf, 


face greta ys | _ ren geet De) 


| 7 x (7.14) 
BP ancahe) PBX) 


is referred to as the posterior pdf of 8. Thus, the prior pdf f(0) represents our 
information about 8 prior to the observation of the outcomes of X), ..., X,,, and the 


posterior pdf f(0 |x), ..., x,,) represents our information about 8 after having 
observed the sample. 
The conditional mean of 8, defined by 


Ap =E(8[ 4-4 )= Jy Of(@ X,.....%, )d0 (7.15) 


is called the Bayes’ estimate of 8, and 
G, = E(O|X,, ..-.X,) (7.16) 


fai 


is called the Bayes’ estimator of 0. 


7.6 Mean Square Estimation 


In this section, we deal with the second type of estimation problem—that is, 
estimating the value of an inaccessible r.v. Y in terms of the observation of an 
accessible r.v. X. In general, the estimator Y of Y is given by a function of X, g(X). 
Then y —-Y= y— 9(X) is called the estimation error, and there is a cost 
associated with this error, C[ Y — g(X)]. We are interested in finding the function 
g(X) that minimizes this cost. When_X and Y are continuous r.v.’s, the mean square 
(m.s.) error is often used as the cost function, 


CLY — g(X)] = FXLY — g(X)F} (7.17) 
It can be shown that the estimator of Y given by (Prob. 7.17), 


¥ = 9(X) = E(Y 


X) (7.18) 
is the best estimator in the sense that the m.s. error defined by Eq. (7.17) is a 
minimum. 

7.7 Linear Mean Square Estimation 


Now consider the estimator y of Y given by 


n 


Y = g(X) = aX | b (7.19) 
We would like to find the values of a and b such that the m.s. error defined by 
e— EY -Y PF] — LY — (aX + bop} (7.20) 
is minimum. We maintain that a and b must be such that (Prob. 7.20) 
E{[Y — (aX — bX} = 0 (7.21) 
and a and b are given by 


Oxy Oy 7 
a—-—t-—p,, D— fly — apy (7.22) 
2 Oy 


Ty 
and the minimum m.s. error e,, is (Prob. 7.22) 


e = o%(1 — p2,) (7.23) 


m 


where oyy = Cov(X, Y) and pyy is the correlation coefficient of X and Y. Note that 
Eq. (7.21) states that the optimum linear m.s. estimator y — gx + p}of Yis such 
that the estimation error y — Y — y — (aX + h) 1s orthogonal to the observation 
X. This is known as the orthogonality principle. The line y = ax + bis often called 
a regression line. 

Next, we consider the estimator Y of Y with a linear combination of the random 
sample (X), ..., X,,) by 


_ SGX; (7.24) 


Again, we maintain that in order to produce the linear estimator with the minimum 
m.s. error, the coefficients a; must be such that the following orthogonality 


conditions are satisfied (Prob. 7.35): 


4 
vr 


¥—¥ a,x, x, -—0 fHl.an (7.25) 


i=] d 


E 


Solving Eq. (7.25) for a;, we obtain 


a= Roy (7.26) 
where 
ay ] b, Ri 1 Se Ron 
a=|:| r=|° = E(YX ;) R=[ 6 2» Ri, =A(X;X)) 
ai, | by Ry Ra 


and R! is the inverse of R. 


SOLVED PROBLEMS 


Properties of Point Estimators 


The 
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Let (Xj, ..., X,,) be a random sample of X having unknown mean u. Show 
that the estimator of defined by 


> X= X (7.27) 
is an unbiased estimator of 1. Note that y is known as the sample mean 
(Prob. 4.64). 

By Eq. (4.108), 


Yaya! Syne tan) = 
=— ) EX) > (net) u 


i=] i=] 


E(M)=E 


| n 
a » x 
My 


Thus, / is an unbiased estimator of . 


Let (Xj, ..., X,,) be a random sample of X having unknown mean p and 
variance o~. Show that the estimator of o” defined by 


, ig _ 
2.=—- SOK — FP (7.28) 
) ) 


where ¥ is the sample mean, is a biased estimator of 67. 
By definition, we have 
” 
o? =E|(X,-n) | 


Now 


7.3. 


= alt Sl -j-X-pl 


| 2 i=l 


i] 


AD —py —n(X — py | 


le 2 ——— 9 
=-) AX; —py ]-ELA —ay =o az" 
M j= 


By Eqs. (4.112) and (7.27), we have 


n 


7 2 =VarX)= So? =< 0° 


i=1 


Thus, 


which shows that S” is a biased estimator of o”. 


Let (X), ..., X,,) be a random sample of a Poisson r.v. X with unknown 
parameter i. 
(a) Show that 


fl 


A, _! X. and A, = (X, + X,) 
} 


I 


mle 


i=! 


are both unbiased estimators of i. 
(b) Which estimator is more efficient? 


(a) By Eas. (2.50) and (4.132), we have 


fl 


y-1 = a 
EI Am) —(na) y. 


RAs) = [E(X1)+ E(X,)] => (24) = J 


4 
2 


Thus, both estimators are unbiased estimators of i. 
(b) By Egg. (2.51) and (4.136), 


wea.) 3 2-'Svael x)= Sivaelx.) = nae 
Var(A,) me ¥ Var(X;) Shoe (nd) 


i=1 


Var(A,) =>(24) = 


to | > 


Thus, ifm > 2, A, is a more efficient estimator of A than A,, since A/n < 
/2. 


7.4. Let (Xj, ...,X,) be a random sample of Y with mean u and variance 6”. A 
linear estimator of 1 is defined to be a linear function of Xj, ..., X,, U(X), -..; 
X,,). Show that the linear estimator defined by [Eq. (7.27)], 


_1x _¢ 
caer ee X 


is the most efficient linear unbiased estimator of wu. 


Assume that 


is a linear unbiased estimator of 1 with lower variance than M. Since M, is 
unbiased, we must have 


E(M,) = Sia EX) = 0 Sq, = 1 
i=1 =I 


which implies that ¥7_,4; =1. By Eq. (4.136), 


] = fl 5 
Var(V)=- o and Var(M,)=o" 5 a; 
Rt i=! 


By assumption, 


Rn 
2 gow b323 aiventl 
cr Sa, <=¢0 or Sa, <= 
n 


(7.30) 
i—l i-l n 


Consider the sum 


which, by assumption (7.30), is less than 0. This is impossible unless a; = 
1/n, implying that M is the most efficient linear unbiased estimator of i. 


7.5. Show that if 


lim £(®,)=@ and lim Var(O,)=0 


nc 


then the estimator ©,, 1s consistent. 


Using Chebyshev’s inequality (2.116), we can write 


pla, -ajzea NO le | eda, - F614 Fa, )— er} 
Ess c: 
=~ E{[, - £19, +1£(0,)- aT 74, — (8, NIE, j- 81} 
Z 


a A vara.) + F(PFG, OF} + 2E{18, — Fie, EE -alp 
ef i " 1 if 


Thus, if 


= 
bs 
io 

I 


0 and lim Var(@,,) =0 


No nes 


then 


lim PO, —6|=e)=0 


nwo 


that is, ©, is consistent [see Eq. (7.6)]. 
7.6. Let (X), ..., X,,) be a random sample of a uniform r.v. X over (0, a), where a 
is unknown. Show that 


A MOK Kee) (7.31) 


“ 
is a consistent estimator of the parameter a. 


If X is uniformly distributed over (0, a), then from Eqs. (2.56), (2.57), and 
(4.122) of Prob. 4.36, the pdf of Z = max(Xj, ..., X,,) 1s 


fele)—miclalfeclaly '—4| | 06260 aay 
Thus, 
E(A) a fy 2hwde = 7 edz = — al 
and 
lim E(A)=a 


no 


Next, 


fl 
at? 


2 
a 


> a A073 fi a a: 
E(A’)= ff, eid [2 de 
os al 


2 2,2 
Var(A)= k(A? ) —|E(A)/? =i = = ___g 
nt+2 (n+l) (2 +2\n+1) 


and 


lim Var(A) =0 
nao 


Thus, by Eqs. (7.7) and (7.8), A is a consistent estimator of parameter a. 
Maximum-Likelihood Estimation 


7.7. Let (X}, ..., X,,) be a random sample of a binomial r.v. X with parameters (m, 
Pp), where m is assumed to be known and p unknown. Determine the 
maximum-likelihood estimator of p. 


The likelihood function is given by [Eq. (2.36)] 


(—Xx,) | m |p 1 p \ M—Xy) 


aie 


7 _ {m 
cD Hae 2 CET Te 12 ; 
| 


=") -(" ales) ido 


“ n 


|p\a-p 


Taking the natural logarithm of the above expression, we get 


i y e y 
fe 


fe 
In Fp}— Ine | S x; | 1n p | [i Sy Infl py 
\ jl ? + f-1 ? 
" TH" 
where c= 
ay Ay r 
Rn i] ‘ 
a le 
and —Infitpj—-— > x; fats a 
tp ss pe ; P > | 


CT 


13. 


Setting d[In L(p)]/dp = 0, the maximum-likelihood estimate /,,, of p is 
obtained as 


ay, (tad) 
qh, 
a 


ee Des 
Put =—Yy¥X, eres (7.34) 


Let (X), ..., X,,) be a random sample of a Poisson r.v. with unknown 
parameter 1. Determine the maximum-likelihood estimator of 2. 


The likelihood function is given by [Eq. (2.48)] 


nA a yA ii Li 
LA) = fo... A=] YA ae 


i=| 


a e levex 
x;! Xj leex,! 


Thus, 
In L(A) = —nA + Indy x; = Ine 
i=l 
where 
c= [|e ') 
i=l 
and 


d 1 n 
ae LQ ons 


TD. 


Setting d[In L(A) ]/dd = 0, the maximum-likelihood estimate digi of A is 
obtained as 


hi = >%, (7.35) 
n=) 
Hence, the maximum-likelihood estimator of i is given by 
| w _ . 
Aga — 6X seers X,)-— X —X (7.36) 
ML aes | mi > ‘ 


Let (Xj, ..., X,,) be a random sample of an exponential r.v. X with unknown 
parameter 1. Determine the maximum-likelihood estimator of 2. 


The likelihood function is given by [Eq. (2.60)] 


i 


L(A) = f 5-005 %n3 A) = | YAP 


i=l 


—Ax; 


Thus, 


InL(A)=nIna-AaY x; 


i=1 


and 
d nm < 
an L(A) = ri >. 


Setting d[In L(A)]/dd = 0, the maximum-likelihood estimate die of A is 
obtained as 


Ci3F) 


Hence, the maximum-likelihood estimator of i is given by 


A w= s( Xy yrs X,) a = a 


7.10. Let (Xj, ..., X,) be a random sample of a normal random r.v. X with 
unknown mean yu and unknown variance o”. Determine the maximum- 
likelihood estimators of u and 07. 


The likelihood function is given by [Eq. (2.71)] 


L(2,0)= fX1.--0 i HOV=[] 3 exp|- 
i=1 


Thus, 


a le > 
= ——I|n(27)- = 2 =i 
In L(w,o) 5 In(27)—-nIno =: 2 {t) 


In order to find the values of 1 and o maximizing the above, we compute 


a 1 W 
—In L(u,0) =—> > (4; - 2) 
Ou er » 


a 
00 


1 ] = 5 
In L(u, 0) =—-— +—> x W) 
(L sto > yoy 
Equating these equations to zero, we get 


Sy — ty) = 0 
i=1 


and 


n 


1 ¥ A 
= s S 0; — fi) = 


OmL i=l OL 


Solving for ji,,, and 6,,,, the maximum-likelihood estimates of 1 and 6? are 
given, respectively, by 


” l< 
tag, = * > Aj (7.39) 
i=] 
A 4 | a a 3 
Oy, =— X: — May, | 7.40 
ML >! road we) ( ) 


Hence, the maximum-likelihood estimators of 1 and 07 are given, 
respectively, by 


| - ¢ Vs 
M an “a X,=X (7.41) 
f=1 
> ] =: > 
Sur. == 2M =i) (7.42) 
i=1 


Bayes’ Estimation 


7.11. Let (X, ..., X,) be the random sample of a Bernoulli r.v. XY with pmf given 
by [Eq. (2.32)] 


fos p) = pa — py * r=0,1 (7.43) 


where p, 0 <p < 1, is unknown. Assume that p is a uniform r.v. over (0, 1). 
Find the Bayes’ estimator of p. 


The prior pdf of p is the uniform pdf; that is, 
f~)=1 O0<p<i 
The posterior pdf of p is given by 


Tie?) 


F(P| A+ ¥n) = F Biexeny Be) 


Then, by Eq. (7.12), 


Tp Vite yes x, a P) = F(X... X_| PFC) 


i= Xi n— Sie 5 ni n—m 
= peti — py 2" = p™— p) 
where ™ = Si and by Eq. (7.13), 
| I n—in 
FO hd =f F Bjieneen Ben dR = peil-py “dp 


Now, from calculus, for integers m and k, we have 


1 "a — pt d atk! ‘ 
yeas ahh apia —_k_ Sa) 
So? as (m2 +k +1)! (7.44) 
Thus, by Eq. (7.14), the posterior pdf of p is 
re , 17 — io 
fElannnt 22 ee. ee ee 
SP (Xqyxusaly) m!(71 — nz)! 
and by Eqs. (7.15) and (7.44), 
i; 
E(p [inn Xn) = J PLP] Ais een) dp 
(n +1)! 1 m+ n—m 
= l—p Ip 
m'(n — m)! Jo O~P) $ 
_ (tty! (nt iam)! 
mC — m1)! (7+ 2)! 
m+] ] . 
= = Xe] 
n+2 n+2 | ») 
Hence, by Eq. (7.16), the Bayes’ estimator of p is 
I . 
Py = E(p |X) ysis Xe) = Ap (7.45) 
it (p|X, " ere | 


7.12. Let (Xj, ..., X,) be a random sample of an exponential r.v. X with unknown 
parameter 2. Assume that A is itself to be an exponential r.v. with parameter 


a. Find the Bayes’ estimator of i. 


The assumed prior pdf of A is [Eq. (2.48)] 
eos \o i ma O 
SRI 


0) wiherwine 


" 
: , <4 Le; sand. 
Now F tpistazae| A} = | jeep ee Ge ie 
=] 


where m="_, x;. Then, by Eqs. (7.12) and (7.13), 


FGiyoeX,)= f. F(X ,-0-4%,| AFA) da 


oC — — 
=~ Are mA ove OA di 


a 5 al 
=a fi ig tore 2 —__—__a 
0 (a + my 


By Eq. (7.14), the posterior pdf of i is given by 


FO Bice) 
Mises 6 een n! 


Thus, by Eq. (7.15), the Bayes’ estimate of A is 
gla HisoosehyV— fi ALO [aise ida 


‘ l - 
ey hha ee 
ee ft min 1) 


n' 4) 
(atmy' tn +b! (7.46) 
n! (te + ny" * 


n-l n+l 


atm bJ 
t+ Dy: 
=] 


and the Bayes’ estimator of i is 


hem ATE 86ers 

2 =———_= 3 

= at ax 
a-~yx 


i 


(7.47) 
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7.13. Let (X, 


., X,,) be a random sample of a normal r.v. X with unknown mean 


u and variance 1. Assume that p is itself to be a normal r.v. with mean 0 and 
variance 1. Find the Bayes’ estimator of . 


The assumed prior pdf of u is 


= I =p? /2 
f(u) arog 


Then by Eq. (7.12), 


Pe occ ty = LO yey hy Cd 
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nate 
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Then, by Eq. (7.14), the posterior pdf of u is given by 
Pet lag... ty FUN s vere ty HR 
f. fA cree Nie: Mee 
= (n—1} ] ft . 
—C oxp ; E ie 


where C = C(x, ..., X,) is independent of tp. However, Eq. (7.48) is just the 
pdf of a normal r.v. with mean 


and variance 


n+1 


Hence, the conditional distribution of u given x), ..., x, 1s the normal 
distribution with mean 


and variance 


n+1 


Thus, the Bayes’ estimate of u is given by 


‘ | “ 
Uy =E( pe X),-...X,)=—— K; 7.49 
B if nt atl > rs { 4 ) 


and the Bayes’ estimator of 1 is 


nid 7 ce n | i* (7.50) 


7.14. Let (Xj, ..., X,) be a random sample of a r.v. X with pdf f(x; 8), where 0 is 


an unknown parameter. The statistics L and U determine a 100(1 — a) 
percent confidence interval (L, U) for the parameter 0 if 


PE ee iys1 a Oa | 751) 


and 1 — ais called the confidence coefficient. Find L and U if X is a normal 
r.v. with known variance o” and mean p is an unknown parameter. 


If X¥ = N(u; 07), then 


Z =—— = where x= 


is a standard normal r.v., and hence for a given o@ we can find a number z,,/7 
from Table A (Appendix A) such that 


For example, if 1 — a = 0.95, then Z,/5 = Zp 975 = 1.96, and if 1 —a = 0.9, then 
Zw/2 = Z0.95 = 1.645. Now, recalling that o > 0, we have the following 
equivalent inequality relationships; 


Ting 


Fee ivn POX Hw Zl ar) 
x Cae t? ia? dn }= NS X a) (7 i fit } 


= X zai woe X  znloivnd 


Thus, we have 


= Pane hias ee ate FB 7 
PIX tila inns we Xt 2 losin = loa (7.33) 
and so 
= oe a - = Pome oT 7 
LoOX sister iin) and OX | tygftti vn) (7.54) 


7.15. Consider a normal r.v. with variance 1.66 and unknown mean u. Find the 
95 percent confidence interval for the mean based on a random sample of 
size 10. 


As shown in Prob. 7.14, for 1 — a = 0.95, we have Z,y/7 = Zp 925 = 1.96 and 


Zana Vn) =1.96(/1.66 / 10) =0.8 
Thus, by Eq. (7.54), the 95 percent confidence interval for p is 


(X —0.8, X+08) 


Mean Square Estimation 


7.16. Find the m.s. estimate of ar.v. Y by a constant c. 


Tel bs 


By Eq. (7.17), the m.s. error is 
e=F(¥ —cy |= ia (y—cy fOyay 
Clearly the m.s. error e depends on c, and it is minimum if 


ae _ 
Fed -wlY ~ OF O) dy = 0 


or 


cf fO) dy =c =f, yf (y) dy 
Thus, we conclude that the m.s. estimate c of Y is given by 
§S=c=f sfody=£) 
Find the m.s. estimator of ar.v. Y by a function g(X) of the rv. X. 
By Eq. (7.17), the m.s. error is 


e=EUIY —g00P}= ff ly — a? flay de dy 


Since f(x, v) = f(y |x) f(x), we can write 


é =f, raf [y _ gy’ f(y x} | raw 


(7.56) 


7.18. 


Since the integrands above are positive, the m.s. error e is minimum if the 
inner integrand, 


i [y— QP SO) dy (7.58) 


is minimum for every x. Comparing Eq. (7.58) with Eq. (7.55) (Prob. 7.16), 
we see that they are the same form if c is changed to g(x) and f(y) is changed 
to f(y |x). Thus, by the result of Prob. 7.16 [Eq. (7.56)], we conclude that the 
m.s. estimate of Y is given by 


p= aex)= f pe vf(y| x)dy = EY x) (7.59) 
Hence, the m.s. estimator of Y is 
¥ = 9(X) = E(Y|X) (7.60) 
Find the m.s. error if g(x) = E(Y|x) is the m.s. estimate of Y. 


As we see from Eq. (3.58), the conditional mean E(Y|x) of Y, given that X = 
x, is a function of x, and by Eq. (4.39), 


EVE(Y|X)] = E(Y) (7.61) 


Similarly, the conditional mean E[g(X, Y)|x] of g(X, Y), given that X = x, is a 
function of x. It defines, therefore, the function E[g(X, Y)|X] of the rv. X. 
Then 


EFT a(X Yi] X]} fi [fh eeneo 


s)ay] Fladae 
= J fs gla. vif Cy |) fla) ede dy (7.62) 
-J & Bs atx vertx vied dy — BLeLX 3 
Note that Eq. (7.62) is the generalization of Eq. (7.61). Next, we note that 
Ele (Xig.(¥ |x] = Lhe (gC) [s] = 2 ele (Fx) (1.63) 


Then by Eqs. (7.62) and (7.63), we have 


Ale OOOO — MAL Oe, OO] X]b = bbe OME OPED (7.64) 


Now, setting g,(X) = g(X) and g,(Y) = Y in Eq. (7.64), and using Eq. (7.18), 
we obtain 


Elg(X)¥] = Elg(X)E(Y|X)] = Elg?(X)] 


Thus, the m.s. error is given by 


© = ELLY — gOOT} = EO?) — 2ATg(X AY 1 + Ele? (XO) 
= #(¥7) — EXO (7.65) 


7.19. Let Y=.X? and Xbe a uniform rv. over (—1, 1). Find the m.s. estimator of Y 
in terms of X and its m.s. error. 


By Eq. (7.18), the m.s. estimate of Y is given by 
g(x) = E(Y|x) = EX? |X =x) =x? 
Hence, the m.s. estimator of Y is 
¥ =x? (7.66) 

The m.s. error is 

e—E{UY 2gQXOP}—AEIX? 2F}-0 (7.07) 
Linear Mean Square Estimation 
7.20. Derive the orthogonality principle (7.21) and Eq. (7.22). 

By Eq. (7.20), the m.s. error is 

e(a, b) = E{[Y — (aX + b)/} 


Clearly, the m.s. error e is a function of a and 4, and it is minimum if de/da = 
0 and de/0 b= 0. Now 


Of = E{Q{Y — (aX +b)\(—X)} =-2E{IY — (aX +b)1X} 


O¢ = E{Q[Y — (aX + b)-D} = — 2EXIY — (aX +)]} 


Setting de/Oa = 0 and de/0b = 0, we obtain 
EILY — (aX + b)|X} - 0 (7.68) 
ELY — (aX — b)] —0 (7.69) 


Note that Eq. (7.68) is the orthogonality principle (7.21). 


Rearranging Eqs. (7.68) and (7.69), we get 


E(X2)a + E(X)b = E(XY) 
E(X)a + b = E(Y) 


Solving for a and b, we obtain Eq. (7.22); that is, 
E(XY)—E(X)E(Y) — oyy ; 
a Sa Se 
E(X") —[E(X)] Ox 
b=E(Y)— aE(X) = uy — apy 


where we have used Eqs. (2.31), (3.51), and (3.53). 


Show that m.s. error defined by Eq. (7.20) is minimum when Eqs. (7.68) 


Tank, 
and (7.69) are satisfied. 


Assume that 7 — -x + g, where c and d are arbitrary constants. Then 
e(e, d) = EY — (cX + dy} = EXLY — (aX + 6) — (a — 0X + (b-— a} 
= ELLY — (aX + bi} + Eff(a — o)X + (b — dbp} 
+ 2(a — c)E{LY — (aX — bX} + 2(b — DELLY — (aX + b)]} 
— e(a, b) + E{|(a — c)X — (b — ad)? } 
+ 2(a — ELLY — (aX — bX} + 2(6 — HELLY — (aX + b)]} 


The last two terms on the right-hand side are zero when Eqs. (7.68) and 
(7.69) are satisfied, and the second term on the right-hand side is positive if 


a#cand b#d. Thus, e(c, d) = e(a, b) for any c and d. Hence, e(a, b) is 
minimum. 


7.22. Derive Eq. (7.23). 


By Eqs. (7.68) and (7.69), we have 


EUY (ax | bas} -I-£LYY (aX | bib} 
Then eee, Py ESLY 1 BPP OAL GX ALP GX AY} 
HELE GaN YY} CA) adbXY PECL 


Using Eqs. (2.31), (3.51), and (3.53), and substituting the values of a and b 
[Eq. (7.22)] in the above expression, the minimum m.s. error is 
= a oe fe — (Oxy + fly lly) — (ley — Upty ty 


4 ” ‘ 
? a ee), er: ps ae re 4 
=Oy —a0yy —Oy — eo Oy a. or ey (l- Pxy ) 
Uy Oy 


which is Eq. (7.23). 


7.23. Let Y=X?, and let X be a uniform r.v. over (—1, 1) (see Prob. 7.19). Find 
the linear m.s. estimator of Y in terms of X and its ms. error. 


The linear m.s. estimator of Y in terms of X is 


A 


Y=aX+b 
where a and b are given by [Eq. (7.22)] 


Oo 
a= - b= Uy — apy 
OX 


Now, by Eqs. (2.58) and (2.56), 
ty = F(X) =0 


4 tl 
E(XY) = E(XX?) = BUX) == fx? de =0 


By Eq. (3.51), 


Oxy = Cov(XY) = E(XY) — EXX)E(Y) = 0 


Thus, a = 0 and b = E(Y), and the linear m.s. estimator of Y is 
Y=b=E(Y) (7.70) 
and the m.s. error 1s 


e= FLY — EYP} = a2 (7.71) 


7.24. Find the minimum m.s. error estimator of Y in terms of X when_X and Y are 
jointly normal r.v.’s. 


By Eq. (7.18), the minimum m.s. error estimator of Y in terms of X is 
Y = E(Y|X) 
Now, when_X and Y are jointly normal, by Eq. (3.108) (Prob. 3.51), we have 


| o Oy 
E(Y |x) = pyy x + My — Pyy Ex 
oF Ox 


Hence, the minimum m.s. error estimator of Y is 


~! 
~t 
bh 


“ oy es Oy Uy 
P= BY O— pes OF +e oe 
P xy ty — Px Hy 

~ Oy Oy 


Comparing Eq. (7.72) with Eqs. (7.19) and (7.22), we see that for jointly 
normal r.v.’s the linear m.s. estimator is the minimum m.s. error estimator. 


SUPPLEMENTARY PROBLEMS 


7.25. Let (Xj, ...,X,) be a random sample of X having unknown mean u and 
variance o”. Show that the estimator of o* defined by 


7.26. 


Tedd. 


7.28. 


ele 


7.30. 


pe) 


Tedd. 


1 
n—-1 


2 _ 
v= 


Sux, = KY 
i=l 


where x is the sample mean, is an unbiased estimator of 6”. Note that S? is 
often called the sample variance. 


Let (Xj, ..., X,,) be a random sample of X having known mean yu and 


unknown variance o”. Show that the estimator of o* defined by 
5,2 =2 3X, - 0 
0 ne i 


is an unbiased estimator of 0. 


Let (Xj, ..., X,,) be a random sample of a binomial r.v. X with parameter (m, 


p), where p 1s unknown. Show that the maximum-likelihood estimator of p 
given by Eq. (7.34) is unbiased. 


Let (Xj, ..., X,,) be a random sample of a Bernoulli r.v. X with pmf f(x; p) = 


p'( — p)'*, x =0, 1, where p, 0 < p < 1, is unknown. Find the maximum- 
likelihood estimator of p. 


The values of a random sample, 2.9, 0.5, 1.7, 4.3, and 3.2, are obtained 
from ar.v. X that is uniformly distributed over the unknown interval (a, 5). 
Find the maximum-likelihood estimates of a and b. 


In analyzing the flow of traffic through a drive-in bank, the times (in 
minutes) between arrivals of 10 customers are recorded as 3.2, 2.1, 5.3, 4.2, 
1.2, 2.8, 6.4, 1.5, 1.9, and 3.0. Assuming that the interarrival time is an 
exponential r.v. with parameter A, find the maximum likelihood estimate of 
x. 


Let (X), ..., X,,) be a random sample of a normal r.v. X with known mean p 


and unknown variance 62. Find the maximum likelihood estimator of 07. 


Let (Xj, ..., X,,) be the random sample of a normal r.v. XY with mean p and 


variance 6”, where ut is unknown. Assume that p is itself to be a normal rv. 


with mean 1, and variance o;. Find the Bayes’ estimate of pL. 
7.33. Let (Xj, ..., X,,) be the random sample of a normal r.v. X with variance 100 


and unknown uw. What sample size n is required such that the width of 95 
percent confidence interval is 5? 


7.34. Finda constant a such that if Yis estimated by aX, the m.s. error is 
minimum, and also find the minimum m.s. error é,,. 


7.35. Derive Eqs. (7.25) and (7.26). 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


7.25. Hint: Show that S,° = — S*, and use Eq.(7.29) 
i 


7.26. Hint: Proceed as in Prob. 7.2. 


7.27. Hint: Use Eq. (2.38). 
1 _ 
Tae Py =— ) X;=X 
ML hn > i 
7.29, dy =min x; =0.5, by. = max x; = 4.3 


A 


1 n 
7.31. Sun’ =— (Xi - wy 
i=1 


7.32. io (y+ dail L sel 
Oy” cm” | sy ome 


Too n=62 


7.34. a= E(XY)/E(X?) e = EY?) — LEXY) P/LEOOP 


7.35. Hint: Proceed as in Prob. 7.20. 


CHAPTER 8 


Decision Theory 


8.1 Introduction 


There are many situations in which we have to make decisions based on 
observations or data that are random variables. The theory behind the 
solutions for these situations is known as decision theory or hypothesis 
testing. In communication or radar technology, decision theory or hypothesis 
testing is known as (signal) detection theory. In this chapter we present a 
brief review of the binary decision theory and various decision tests. 


8.2 Hypothesis Testing 


A. Definitions: 


A statistical hypothesis is an assumption about the probability law of r.v.’s. 
Suppose we observe a random sample (Xj, ..., X,,) of a r.v. X whose pdf f(x; 


0) = f(x, ..., X,; 8) depends on a parameter 0. We wish to test the 
assumption 0 = 09 against the assumption 0 = 0,. The assumption 0 = p 1s 
denoted by Ho and is called the null hypothesis. The assumption 0 = 0, is 
denoted by H, and is called the alternative hypothesis. 

H,: @=86, (Null hypothesis) 


0=0 (Alternative hypothesis) 


A hypothesis is called simple if all parameters are specified exactly. 
Otherwise it is called composite. Thus, suppose Ho: 8 = 09 and H;: 0 # 0p; 


then Ho is simple and H, is composite. 


B. Hypothesis Testing and Types of Errors: 

Hypothesis testing is a decision process establishing the validity of a 
hypothesis. We can think of the decision process as dividing the observation 
space R” (Euclidean n-space) into two regions Ry and R,. Let x = (x), ..., X,) 
be the observed vector. Then if x € Ro, we will decide on Hp; if x € R;, we 
decide on H,. The region Ry is known as the acceptance region and the 
region R, as the rejection (or critical) region (since the null hypothesis 1s 


rejected). Thus, with the observation vector (or data), one of the following 
four actions can happen: 


1. Ho true; accept Ho 
2. Ho true; reject Hp (or accept H,) 
3. H, true; accept H, 
4. H, true; reject H, (or accept Hp) 


The first and third actions correspond to correct decisions, and the second 
and fourth actions correspond to errors. The errors are classified as 


1. Type Lerror: Reject Hp (or accept H,) when H 1s true. 
2. Type Lerror: Reject H, (or accept Hj) when H, is true. 
Let P; and P;, denote, respectively, the probabilities of Type I and Type 


II errors: 
P = PD, | H,) = Pee i; ,) (B.1) 
ae = P(D, |) = Fix Ry: A) (B.A) 


where D, (i = 0, 1) denotes the event that the decision is made to accept H,, 
P, is often denoted by o and is known as the /evel of significance, and Py 1s 
denoted by £ and (1 — B) is known as the power of the test. Note that since a 


and B represent probabilities of events from the same decision problem, they 
are not independent of each other or of the sample size n. It would be 
desirable to have a decision process such that both a and B will be small. 
However, in general, a decrease in one type of error leads to an increase in 
the other type for a fixed sample size (Prob. 8.4). The only way to 
simultaneously reduce both types of errors is to increase the sample size 
(Prob. 8.5). One might also attach some relative importance (or cost) to the 
four possible courses of action and minimize the total cost of the decision 
(see Sec. 8.3D). 

The probabilities of correct decisions (actions 1 and 3) may be expressed 
as 


P(D,|F,) — Px € R,; A) (8.3) 
P(D, |1E) = Pee Kyi) (8.4) 
In radar signal detection, the two hypotheses are 


H,: No target exists 


H,: Target is present 


In this case, the probability of a Type I error P; = P(D, |Ho) is often referred 
to as the false-alarm probability (denoted by P,), the probability of a Type II 
error P}; = P(Do |H;) as the miss probability (denoted by Py), and P(D, |) 
as the detection probability (denoted by Pp). The cost of failing to detect a 
target cannot be easily determined. In general we set a value of P; which is 
acceptable and seek a decision test that constrains P; to this value while 
maximizing Py (or equivalently minimizing P,). This test is known as the 
Neyman-Pearson test (see Sec. 8.3C). 


8.3 Decision Tests 


A. Maximum-Likelihood Test: 


Let x be the observation vector and P(x |H,), i= 0.1, denote the probability 
of observing x given that H; was true. In the maximum-likelihood test, the 
decision regions Ry and R, are selected as 


R,, = {x: P(x|H,) > P(x| Ay} 


(8.5) 
R, — {: P(x|H,) < P(x | Hy} 
Thus, the maximum-likelihood test can be expressed as 
[Hy if P(x Hy) > P(x) H,) 
a(x) = ¢ . | (8.0) 
[7, if P(x| Hy) < P(x Ay) 
The above decision test can be rewritten as 
P(x|H,) i : 
ee 
P(x| Ho) Ho (8.7) 
If we define the likelihood ratio A(x) as 
P(x | A,) 
Aj ———_—_—— (8.8) 
P(x| Hg) oe 
then the maximum-likelihood test (8.7) can be expressed as 
Hy ? 
A(x) 2 1 (8.9) 


Hy 


which is called the likelihood ratio test, and 1 is called the threshold value of 
the test. 
Note that the likelihood ratio A(x) is also often expressed as 


S(x| A) 


A(x)=— : 
I (x| Ay) 


(8.10) 


B. MAP Test: 


Let P(H; |x), i= 0, 1, denote the probability that H; was true given a 
particular value of x. The conditional probability P(H; |x) is called a 


posteriori (or posterior) probability, that is, a probability that is computed 
after an observation has been made. The probability P(H,), i = 0, 1, is called 


a priori (or prior) probability. In the maximum a posteriori (MAP) test, the 
decision regions Ry and R, are selected as 


R, = {x: PUI,|x) > PCI, |x)} (8.11) 
R, — {x: P(A, |x) < P(A, |x} 


Thus, the MAP test is given by 


[Ho if PH, |x) > PH, |x) 


d(x) =4 7 . . | (8.12) 
|i, if PUM, [x)< PU, |x) 
which can be rewritten as 
P(H, |x) Ay 
——— 2 1 (8.13) 
P(Hy |x) Hy 
Using Bayes’ rule [Eq. (1.58)], Eq. (8.13) reduces to 
P(x H,)PCUH,) 
bi aida Ele (8.14) 


P(x| Hy )PULy) Ho 


Using the likelihood ratio A(x) defined in Eq. (8.8), the MAP test can be 
expressed in the following likelihood ratio test as 


H 
A(x) & y= 0) (8.15) 
i PU) 


where n = P(H,)/P(H,) is the threshold value for the MAP test. Note that 
when P(A) = P(H;), the maximum-likelihood test is also the MAP test. 


C. Neyman-Pearson Test: 


As mentioned, it is not possible to simultaneously minimize both a (= P;) 
and B(= Py). The Neyman-Pearson test provides a workable solution to this 


problem in that the test minimizes B for a given level of a. Hence, the 
Neyman-Pearson test is the test which maximizes the power of the test 1 — B 
for a given level of significance a. In the Neyman-Pearson test, the critical 
(or rejection) region R, is selected such that 1 — B = 1 — P(D, |H,) = PD, 
\7,) is maximum subject to the constraint a = P(D, |Ho) = do. This is a 


classical problem in optimization: maximizing a function subject to a 
constraint, which can be solved by the use of Lagrange multiplier method. 
We thus construct the objective function 


J = = Ae (8.16) 


where A > 0 is a Lagrange multiplier. Then the critical region R, is chosen to 


maximize J. It can be shown that the Neyman-Pearson test can be expressed 
in terms of the likelihood ratio test as (Prob. 8.8) 


A(x) 2H=A (8.17) 


where the threshold value n of the test 1s equal to the Lagrange multiplier A, 
which is chosen to satisfy the constraint a = dg. 


D. Bayes’ Test: 


Let C;; be the cost associated with (D, H;), which denotes the event that we 
accept H; when //; 1s true. Then the average cost, which is known as the 
Bayes’ risk, can be written as 


C= Cy iD, A+ Cy PO, H,) + CPO. A) + C0, a) (8.18) 


where P(D,, H;) denotes the probability that we accept H; when H, is true. 
By Bayes’ rule (1.42), we have 


i on PUP) tC, PLE EAU) — 0 PUD EERE CPD | ERP (a1) 
In general, we assume that 


Cit Coe are as (820) 
since it is reasonable to assume that the cost of making an incorrect decision 
is higher than the cost of making a correct decision. The test that minimizes 
the average cost ¢ 1s called the Bayes’ test, and it can be expressed in terms 
of the likelihood ratio test as (Prob. 8.10) 


Hy 7 z 
A(x) - n= Mon ba Eg) 


casas (8.21) 
Hy (Coy Cy, )P (Ay) 


Note that when Cj, — Cog = Co, — Cj), the Bayes’ test (8.21) and the MAP 
test (8.15) are identical. 


E. Minimum Probability of Error Test: 
If we set Cog = Ci, = 0 and Co, = Cio = 1 in Eq. (8.18), we have 


C= AD atl) + PO, tl) = P; (8.22) 


which is just the probability of making an incorrect decision. Thus, in this 
case, the Bayes’ test yields the minimum probability of error, and Eq. (8.21) 
becomes 


idl P(Hy) 


A(x) 2 7 = (8.23) 
Hy P(HA,) 


We see that the minimum probability of error test is the same as the MAP 
test. 


F. Minimax Test: 


We have seen that the Bayes’ test requires the a priori probabilities P(A) 
and P(/,). Frequently, these probabilities are not known. In such a case, the 


Bayes’ test cannot be applied, and the following minimax (min-max) test 
may be used. In the minimax test, we use the Bayes’ test which corresponds 
to the least favorable P(H) (Prob. 8.12). In the minimax test, the critical 


region R* is defined by 

ix CLPUE JR, }= min max C(PUH Ry) Se C (PUL 12] (R24) 
Ri Pi! Ai! 4 F 

for all R, * R; In other words, R7 is the critical region which yields the 


minimum Bayes’ risk for the least favorable P(H,). Assuming that the 


minimization and maximization operations are interchangeable, then we 
have 


min max C[P(A,), B, ]— max min C[PLAy), Ry] (8.25) 
RPL} Pig) | 


The minimization of G [P(Ho), RF; | with respect to R, is simply the Bayes’ 
test, so that 


min C [P( Hy), RJ -—C*LPCAg)) (8.26) 


Rh 


where C*[P(H,)] is the minimum Bayes’ risk associated with the a priori 
probability P(H_). Thus, Eq. (8.25) states that we may find the minimax test 
by finding the Bayes’ test for the least favorable P(H)), that is, the P(A) 
which maximizes ¢ [P(H))]. 


SOLVED PROBLEMS 


Hypothesis Testing 


8.1. Suppose a manufacturer of memory chips observes that the probability 
of chip failure is p = 0.05. A new procedure is introduced to improve 


8.2. 


the design of chips. To test this new procedure, 200 chips could be 
produced using this new procedure and tested. Let r.v. X denote the 
number of these 200 chips that fail. We set the test rule that we would 
accept the new procedure if X < 5. Let 

A: p= 0.05 (No change hypothesis) 

Ae os O05 (Improvement hypothesis) 
Find the probability of a Type I error. 


If we assume that these tests using the new procedure are independent 
and have the same probability of failure on each test, then_X is a 
binomial r.v. with parameters (n, p) = (200, p). We make a Type I error 
if X < 5 when in fact p = 0.05. Thus, using Eq. (2.37), we have 


PB = P(D,| H,)=P(X =5; p=005) 
2, (200 _ 
=5 ( | (0.05)* (0.95)7°0-* 
iao\ & 
Since vn is rather large and p is small, these binomial probabilities can 


be approximated by Poisson probabilities with A = np = 200(0.05) = 10 
(see Prob. 2.43). Thus, using Eq. (2.119), we obtain 


B~ Ye ——=0.067 
rah k! 


Note that Hp is a simple hypothesis but H, 1s a composite hypothesis. 


Consider again the memory chip manufacturing problem of Prob. 8.1. 
Now let 
Ay p= 005 (No change hypothesis) 


Hy: p=002 (Improvement hypothesis) 


8.3. 


Again our rule is, we would reject the new procedure if X > 5. Find the 
probability of a Type II error. 


Now both hypotheses are simple. We make a Type II error if X > 5 
when in fact p = 0.02. Hence, by Eq. (2.37), 


Py = P(Do| H,) = P(X > 5; p= 0.02) 


200° _— 
= >| ‘ }co.02)'o.98y™ . 


k=6\ 


Again using the Poisson approximation with i = np = 200(0.02) = 4, 
we obtain 


Let (X%}, ..., X,,) be a random sample of a normal r.v. X with mean u 
and variance 100. Let 


50 
Hy pH ph CD 


‘ 


= 
| 


and sample size n = 25. As a decision procedure, we use the rule to 

reject Hp if x = 52, where x is the value of the sample mean y defined 

by Eq. (7.27). 

(a) Find the probability of rejecting Hp: p = 50 as a function of i (> 
50). 

(6) Find the probability of a Type I error a. 

(c) Find the probability of a Type II error B (1) when p, = 53 and (11) 
when pt, = 55. 


(a) Since the test calls for the rejection of Hp: p = 50 when x = 52, 
the probability of rejecting Hp is given by 


guy —P 


o(u) = P(X = 52: 4) (8,27) 


Now, by Eqs. (4.136) and (7.27), we have 


Thus, y is M(u; 4), and using Eq. (2.74), we obtain 


iX-u  52-p | ; 
is =F: |-1-4| 
2 


Fi 
a ef 


w=) (S.284 


a, 
a 


The function g(t) is known as the power function of the test, and 
the value of g(t) at 1 = Hy, g(t), 1s called the power at p). 


(b) Note that the power at u = 50, g(50), is the probability of 
rejecting Hp: u = 50 when Hp is true—that is, a Type I error. Thus, 
using Table A (Appendix A), we obtain 


r \ 


— 
a= P. = 9(50)=1-@ ae — (1) =0.1587 


(c) Note that the power at u = [), g(t), 1s the probability of rejecting 
Ho: w= 50 when pt = py. Thus, 1 — g(u,) is the probability of 
accepting Hy) when pu = .,;—that is, the probability of a Type II 
error B. 

(i) Setting uw = pw, = 53 in Eq. (8.28) and using Table A (Appendix A), 
we obtain 


. \ 
52-53 
B= A=1-«53)=a = }=(-]-1-0[ 4} -o30s5 
, ve / us 


(11) Similarly, for p = yp, = 55 we obtain 


8.4. 


7 a 
gi) PX = 53.29, pyr, OF a sm |—1-® 


52-55 3 3 7 
p= Py =| — 2(55) a = 4] =| -«(3) = ().0668 


in 


a 


Notice that clearly, the probability of a Type II error depends on 
the value of pL). 


Consider the binary decision problem of Prob. 8.3. We modify the 
decision rule such that we reject Hy if x = c. 


(a) Find the value of c such that the probability of a Type I error a = 
0.05. 


(b) Find the probability of a Type II error B when 1, = 55 with the 
modified decision rule. 


(a) Using the result of part (5) in Prob. 8.3, c is selected such that 
[see Eq. (8.27)] 


a= 9(50) = P(X =c; u= 50) =0.05 


However, when p = 50, x = N(50; 4), and [see Eq. (8.28)] 


rn = 22%, 4=50}=1-a va |-00s 
5 ») a) 


<= a 


From Table A (Appendix A), we have ® (1.645) = 0.95. Thus, 


e- SO 
2 


= 1.645 and c= 50 + 2(1 645) = 53.29 


(b) The power function g(u) with the modified decision rule is 


53.29 — ge 
2 


Y—u _ 5329—n f 


Setting 1. = pf, =55 and using Table A (Appendix A), we obtain 


— 
p= 2, =1-a(55)=9[ BBS “5 = (-0855) 


= 1-—(0.855) = 0.1963 


Comparing with the results of Prob. 8.3, we notice that with the 
change of the decision rule, a is reduced from 0.1587 to 0.05, but 


B is increased from 0.0668 to 0.1963. 


8.5. Redo Prob. 8.4 for the case where the sample size n = 100. 
(a) With n= 100, we have 


= l 100 
Var(X)=0y" =—o’ = —-= 1 
ar(X) =Oy 7 oO 100 
As in part (a) of Prob. 8.4, c is selected so that 


a= 9(50) = P(X = c; w= 50) = 0.05 


Since ¥ = M50; 1), we have 


—_ 


Y= 5005 ) 
asoyee ee O cy mi5gy)| eA —e- SOYROS 


c-50 = 1,645 and c=51,645 


Thus, 
(6) The power function is 


g (wu) = P(X = 51.645; pw) 


_ i= et u|=1— (51.645 — u) 


=P 


Setting u = p, = 55 and using Table A (Appendix A), we obtain 


B= P= PSS) = tS 1.645 55) = O-3.3955) 20: 


Notice that with sample size n = 100, both a and B have decreased 
from their respective original values of 0.1587 and 0.0668 when n 
= 25. 


Decision Tests 


8.6. 


In a simple binary communication system, during every 7 seconds, 
one of two possible signals so(¢) and s,(¢) is transmitted. Our two 


hypotheses are 


H):  5,(t) was transmitted. 


H . 


,:  §,(2) was transmitted. 


We assume that 


LD = 0 and 5,0) = 1 G=<1<T 


The communication channel adds noise n(t), which is a zero-mean 
normal random process with variance 1. Let x(t) represent the received 
signal: 


x(t) = S@) + nD) P=03.1 


We observe the received signal x(t) at some instant during each 
signaling interval. Suppose that we received an observation x = 0.6. 


(a) Using the maximum likelihood test, determine which signal is 
transmitted. 


(b) Find Py and Pi: 


(a) The received signal under each hypothesis can be written as 


x=n 
Il+n 


be 
II 


Then the pdf of x under each hypothesis is given by 


ge /2 


1 
x|H))=—= 
F@|Ho) J2n 
1 -@-1?2 
+f) =e 
ys | V20 
The likelihood ratio is then given by 


A(x)= f(@| Ay) = g(e—12) 
SQ | H) 
By Eq. (8.9), the maximum likelihood test is 


~1/2) a 
et =] 
Ho 


Taking the natural logarithm of the above expression, we get 


1 71 Hy | 
x—--—-20 or xZ2- 
Ho Ho 


Since x = 0.6 > 7 we determine that signal s(t) was transmitted. 


(6) The decision regions are given by 


efeefea] feed 


Then by Eqs. (8.1) and (8.2) and using Table A (Appendix A), we 
obtain 


fy 
+) = (1), 3085 
1 2 


—_ —e od 5) oe 
he= PWD, J4= J, fe | Hy dae Ti | ii: dx=1— 


Py — PUD | Af, FO Ay) de - = fa? ee 
J NW LIT 


se | 12 ye oS - | — 
r= foe dy 2 a 0.3085 


8.7. In the binary communication system of Prob. 8.6, suppose that 
os 2 | 
MA.) = 3 and P(A) = 7 
(a) Using the MAP test, determine which signal is transmitted when x 
= 0.6. 
(b) Find Py and Pi. 


(a) Using the result of Prob. 8.6 and Eq. (8.15), the MAP test is given 
by 


o(*-172) 3 P(H)) = 
Hy P(A) 


Taking the natural logarithm of the above expression, we get 


Li #i I 
x—--=z In? or x 2 —+I1n2 =—1.193 
2 Ho Hq 2 


Since x = 0.6 < 1.193, we determine that signal s9(¢) was 
transmitted. 
(6) The decision regions are given by 


RK, = a 1.193} = (— &, 1.193) 
RK, = 4a: = 1195) = (1193, 9) 


Thus, by Eqs. (8.1) and (8.2) and using Table A (Appendix A), we 
obtain 


cuss . pes * . aprsr4y d| ~ x12 etn ae 
A= PUD,| Hy) =f, fO| Hg) dx = Sires! dx =1 (1.193) =0.1164 


' : real ge 4 | S|: a eee 
Fy — PLD, 4)- J, falana- sf. ee! ay 


1 “O93 5259 = 
== Jl dy = (0.193) = 0.5765 
VAT 7 


8.8. Derive the Neyman-Pearson test, Eq. (8.17). 


From Eq. (8.16), the objective function is 
f= (lB ale.) = PO | ALP LE) a] (8.29) 


where A is an undetermined Lagrange multiplier which is chosen to 
satisfy the constraint & = 9. Now, we wish to choose the critical 


region R, to maximize J. Using Eqs. (8.1) and (8.2), we have 


f “dy fos Hy} de oA 


Ii, Pix Ayan tt, 
= ie [fix| HI Aftx| Ay idx + Ac (R30) 
I 


To maximize J by selecting the critical region R,, we select x € R, 
such that the integrand in Eq. (8.30) is positive. Thus, R; is given by 


R, = {x: [f@|H,) — A f(x|H,)] > 0} 
and the Neyman-Pearson test is given by 


_ fA) 


A 
9) Fx Hy) ti 


and A is determined such that the constraint 


a=P = P(D||Ho)= fi, F(%| Ho) dx = at 


is satisfied. 


8.9. Consider the binary communication system of Prob. 8.6 and suppose 
that we require that ao = P; = 0.25. 


(a) Using the Neyman-Pearson test, determine which signal is 
transmitted when x = 0.6. 


(b) Find Py. 


(a) Using the result of Prob. 8.6 and Eq. (8.17), the Neyman-Pearson 
test is given by 


12) 7H 
er) 2) 
Ho 


Taking the natural logarithm of the above expression, we get 


1 1 Hy | 
x-—2InA or x2—+InaA 
2 Ho Ho 


The critical region R, is thus 
l 
R, =frx>ztinal 


Now we must determine A such that a = P; = P(D, |Ho) = 0.25. By 
Eq. (8.1), we have 


ore a an ae Be eat wad ce gs] 

P= PUD, | Hy) ie fix Ayidy= ites ee Hs if ar=1 =a +n a | 
fy | Bh f l 4 

Thus, |-«D : t+lna|=0.25 ur Dy - +In 4|=0.75 

, = ! = ! 


From Table A (Appendix A), we find that ® (0.674) = 0.75. Thus, 


5 tin A=0674 + A=1.19 


Then the Neyman-Pearson test is 


Hy 
x = 0.674 


Ho 


Since x = 0.6 < 0.674, we determine that signal so(¢) was 
transmitted. 
(b) By Eq. (8.2), we have 


a _ ol Adee Pet, er 
P, = P(Do| A) J, fl He Td = € ax 


] 0326 _.2545 
—  €*  dy= 0(—0,326) = 0.3722 


aS 


8.10. Derive Eq. (8.21). 


By Eq. (8.19), the Bayes’ risk is 


C Cg PD, |H JP) Cy PID, | AYP) Cy, POD, JAP), PD, | PH) 
Now we can express 
PD, A)-f, fo|Hpdx FO. 7-01 (8.31) 


Then ¢ can be expressed as 


C=C.) PCH te fx] Ay idk -— Cy PEA) [. Fie] Fy jax 


(8.32) 
+Cy PUI F, flx[M ds O Pui ffs 
Vat. '] 


Hyjds 


Since Rp U R; = Sand Ry 1 R,; = o, we can write 


le f(x] Hj dx =f f(x| Hj) ax 2 f(x| H,) dx =1- t. f\x| H,) dx 
Then Eq. (8.32) becomes 


C CagltHad Cy PCH) | J, {kein Coby) POs] Hal Gy, Cy PH) Pox] Hy Ih 


The only variable in the above expression is the critical region R,. By 
the assumptions [Eq. (8.20)] C9 > Cog and Co; > Cj,, the two terms 


inside the brackets in the integral are both positive. Thus, ¢ is 
minimized if R, is chosen such that 


(CnC, PG Fx | BD Cig Gara) Fixe.) 


Coo, 
for all x € Ry. That is, we decide to accept H, if 
(Cy, — CPA) FAD > (Cy — Cog) PlAy) F| Ay) 


In terms of the likelihood ratio, we obtain 


Ast F@| AL) ™ (yy — Cy) Po) 
| T(x| Hy) Ho (Co, — Ci) PC) 
which is Eq. (8.21). 


8.11. Consider a binary decision problem with the following conditional 
pdf’s: 


Ly 
f(x|Ho)= se" | 
fa|H)=err 


The Bayes’ costs are given by 


Cyo=C,=0 CG, =2 £4) =1 


10 

(a) Determine the Bayes’ test if P(H,) = S and the associated Bayes’ 
risk. 

(b) Repeat (a) with P(H,) = ‘ 


(a) The likelihood ratio is 


fix\Ayy elk 
Acxy = 22 vat = Sa (3.33) 
f(x]Ho) LAs 23 
2 
By Eq. (8.21), the Bayes’ test is given by 
b= 2 
H, (1—9) H 
2e Hl = 3-1 of etlzat 
Ho (2 _oy! Ho 2 
Taking the natural logarithm of both sides of the last expression 
yields 
Hy 1 
|x| S — In| — |=0.693 
Ho 2 
Thus, the decision regions are given by 
Ry tata] 0.693} Rk, — {42/4 |<0,695} 
rs es 
Theu REP(D|H = [ae 7 dr=2 foe Sd =05 
hae . es Ie - eo Pe 5 
Fp= Pty |= fe dxt fe adr =2f, ge * dx = 0.25 


and by Eq. (8.19), the Bayes’ risk is 


f ‘ 


Poy 
C— P(Dy| Ho)P(Ho) +2P Do] Hy)PH)~ 05) }+2¢025)| : |-0.5 
H) 


Wa | ba 


‘ t 


(b) The Bayes’ test is 


l 
m O-D)— H 
De) a_i! or etl eS 
7 3 


Again, taking the natural logarithm of both sides of the last 
expression yields 


Hy 1 
|x| S — In| —}=1.386 
Ho 4 
Thus, the decision regions are given by 
R, = eee | | 1.386} R, ={x ‘|e < 1.386} 
: 1386) ce 
= a =” _ x wad 
P=P(D,|Hy)=2 f, 56 = 0.15 
Then 


Py = P(Dy| H,)=2 f- 


e ** dx = 0.0625 
386 


and by Eq. (8.19), the Bayes’ risk is 


C =(0.75) 


| +2(0.0625 (2 
‘) 2 


<a - 


|-oarr 


8.12. Consider the binary decision problem of Prob. 8.11 with the same 
Bayes’ costs. Determine the minimax test. 


From Eq. (8.33), the likelihood ratio is 


In terms of P(H,), the Bayes’ test [Eq. (8.21)] becomes 


| 11 P(Ap) 


jx] 1 PCy) 
= e ie eee 


Ho 4 | _ P( Ha) 


Taking the natural logarithm of both sides of the last expression yields 


1 Al Po) _ 


x|3 8.34 
| ; be a P( Ay) \ ) 


For P(H) > 0.8, 6 becomes negative, and we always decide Ho. For 
P(A)) < 0.8, the decision regions are 


Ry = {x: |x|> 5} R, = {x: |x|< 6} 


Then, by setting Cog = Cy, = 0, Co, = 2, and Cj, = 1 in Eq. (8.19), the 
minimum Bayes’ risk C can be expressed as a function of P(H) as 


oS i -|x : =i ip or —Ir 
CPL Ay) = Ply | [ : sf Hl e+ - Pit) ir en dy — f, é ie 
= P(Hy) fe" dx All — PH fe 
= PtH, Wl—e 14 Ql-PCt le 7 (8.35) 
From the definition of 6 [Eq. (8.34)], we have 


ie 4[1— P(Ay)] 
P(H) 


Thus, 


8.13. 


6 P(Hy) pts Py) 
4[1— P(Ay)I 16L1— P(Ay 


Substituting these values into Eq. (8.35), we obtain 


8P(H,) — 9P?(Hy) 


CPO 5 
Tosa () 


Now the value of P(H)) which maximizes C can be obtained by 
setting dc *[P(H_)|/dP(Ho) equal to zero and solving for P(H). The 
result yields P(H)) = 5. Substituting this value into Eq. (8.34), we 


obtain the following minimax test: 


2 
n 4 q - | 
|x| S In = In 2 =0.69 
0 2 
a 
Suppose that we have n observations X;, i= 1, ..., n, of radar signals, 


and X; are normal iid r.v.’s under each hypothesis. Under Ho, X; have 
mean [ly and variance 6”, while under H,, X; have mean 1, and 


variance o7, and [1; > Lp. Determine the maximum likelihood test. 


By Eq. (2.71) for each_X;, we have 


: l I 2 
x;|Hy) = z—— exp|— —> ©; — Uo) 
Fes) = Jp exe sor os — | 


: I l 2 
f(x;|H,)= Je? - aa? (x; — 4)" | 


Since the X; are independent, we have 


a 


| lx 2 
f(x|Hy) = [| fe: Hy) = oltre EXP - Io? >. (x; — wr" 


i=l i=] 


a n 7 _ l _ l fn “ 
poet = [res i= 5,0] 52 My) | 


t=! 


With [1, — Ug > 0, the likelihood ratio is then given by 


f(x|H,) 7 
= las tS 5 
f(x|H) |20° 


A(x) = bag = He X= AC” = fg” | 
Hence, the maximum likelihood test is given by 


{=1 
ex 2 l 
; = 
P 20° Hy 


Taking the natural logarithm of both sides of the above expression 
yields 


n 


> 2( My — Up )X; — a Ti = ll) 


i=| 


n iW 2 24 
uo My 5 at ty My) 
=e So Sa 
Oo i=] Hy, 20° 


(8.36) 


or 


provides enough information about the observations to enable us to 
make a decision. Thus, it is called the sufficient statistic for the 


8.14. 


maximum likelihood test. 


Consider the same observations X,, i= 1, ..., n, of radar signals as in 
Prob. 8.13, but now, under Hp, X; have zero mean and variance Oe 
while under H,, X; have zero mean and variance o,”, and 0,* > 0,” 
Determine the maximum likelihood test. 


In a similar manner as 1n Prob. 8.13, we obtain 


moe 


t 


; l 
elo) eae OP 
(2.7 OFD y 


; | 1 vw» 
(x H = rE FPS Wires od x 
F | 1 (2ea," P 2a, | i | 


With o,* — 0, > 0, the likelihood ratio is 
‘(xlH , \a 
A(x) = S(x|Hy) D) [oo ex 


[se = os 
ftx|H) (o, 


a a 
209° 0; 


ee ee a 
ew 
i= 
aa ‘ 
=“ 
ala | 
a 


and the maximum likelihood test is 


nt B) 2) n A, 
90} exp|{21 90 |S y2] > 1 
p 7 9 i H, 

O7 209 O7 i= 0 


Taking the natural logarithm of both sides of the above expression 
yields 


See Bn (S| 


204 ory 


! oi — Gy ; 


\ 


Note that in this case, 


Le Gre. © | eG 
i=1 


is the sufficient statistic for the maximum likelihood test. 


8.15. In the binary communication system of Prob. 8.6, suppose that we 
have n independent observations X; = X(¢;), i= 1, ...,n, where 0 <t, < 
ee Oye 
(a) Determine the maximum likelihood test. 
(b) Find P; and P,, form =5 and n= 10. 


(a) Setting [ty = 0 and p, = 1 in Eq. (8.36), the maximum likelihood 
test is 


Nl Re 


aoe 


ar 


alae 


(b) Let 
— 1 . 
oan Das 


Then by Eqs. (4.132) and (4.136), and the result of Prob. 5.60, we 
see that Y is a normal r.v. with zero mean and variance 1/n under 
Hi, and is a normal r.v. with mean | and variance 1/n under H1;. 


Thus, 


_ Se ae 
P= PD, |Ho) =H, Jy (y|Ho) dy = hg Ju ene” dy 
V2 UNE 


22 de = 1 — o(Vn/2) 


| S = 
=> f — ¢ 
* /2 aT ain i2 


Vii RB era —qy2 
— Pry). = ty — (nf2)(v-D : 
n= P(Do |, ) t fyQ |, )dy Jon a e€ dy 


l — vn i2 


J2a i. = 


“22 ae = (—Jn/2) =1- 0(Sn/2) 


Note that P; = Py. Using Table A (Appendix A), we have 


P= P,=1— (1.118) =0.1318 — forn=5 
P,=P,,=1-— (1.581) =0.057 — forn = 10 


8.16. In the binary communication system of Prob. 8.6, suppose that s(t) 
and s,(t) are arbitrary signals and that n observations of the received 
signal x(t) are made. Let n samples of s(t) and s,(¢) be represented, 
respectively, by 


ag ener se andl B= By Sigpeeny tye 
where TJ denotes "transpose of." Determine the MAP test. 


For each_X;, we can write 


FG; 


1. 
9 (x; — sa 


| 
ah) ee 
, 1 


l D 
a (x; — 514) | 
Since the noise components are independent, we have 


fol) =[]fen|a,)  7=0.1 
i=1 


and the likelihood ratio is given by 


fl 


be 2 
EXP |: =p F 7 | 
ral | ie 


f(x|Ho) I] exp|- Lig mg, ?] 


t=] 


A(x) 


nt 


Sli — 894); — sls? = wi 


=exp 


i=1 


Thus, by Eq. (8.15), the MAP test is given by 


Nn 1. 5 2. | P( Hy) 
|S — Sqi)X; ~ 3 Sti — Sor) i ~ Put) 


Taking the natural logarithm of both sides of the above expression 
yields 


~ eee] eed: ae 2 
» (44, — Api )4; = In aa +O; qh) 


xf (8.38) 
=1 Ay Pi, 2 


SUPPLEMENTARY PROBLEMS 


8.17. Let (Xj, ..., X,,) be a random sample of a Bernoulli r.v. X with pmf 


f(x;p)=p*(1—p)y* = x=0,1 


where it is known that 0 <p a Let 


] 
H,: p= Pp, [<4 


and n = 20. As a decision test, we use the rule to reject Ho if 
Dini = 6. 

(a) Find the power function g(p) of the test. 

(b) Find the probability of a Type I error a. 

(c) Find the probability of a Type II error B (i) when p, = ;and (i1) 
I 


when p, = a 


8.18. Let (Xj, ..., X,,) be a random sample of a normal r.v. X with mean 
and variance 36. Let 


p. 


As a decision test, we use the rule to accept Ho if x < 53, where x is 
the value of the sample mean. 


(a) Find the expression for the critical region R). 
(6b) Find o and B for n = 16. 


8.19. Let (Xj, ...,X,,) be a random sample of a normal r.v. X with mean pt 
and variance 100. Let 


Hy: @= 30 
He: P= 39 


As a decision test, we use the rule that we reject Hp if x = c. Find the 
value of c and sample size n such that o = 0.025 and B = 0.05. 


8.20. LetX be anormal rv. with zero mean and variance o7. Let 


g. 


Determine the maximum likelihood test. 


8.21. Consider the binary decision problem of Prob. 8.20. Let P(H)) = 4 
and P(H,) = , Determine the MAP test. 


8.22. Consider the binary communication system of Prob. 8.6. 


(a) Construct a Neyman-Pearson test for the case where a = 0.1. 
(b) Find B. 


8.23. Consider the binary decision problem of Prob. 8.11. Determine the 
Bayes’ test if P(H)) = 0.25 and the Bayes’ costs are 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


a 30) 5 eo | 
(a) a= >| Joa py . O<p= 
k=0 . k 3 
(b) a@=0.0577; (c) (i) B=0.2142, (ii) 6 =0.0024 


8.17. 


8.18. 


8.19. 


8.20. 


8.21. 


8.22. 


8.23. 


(a) Ry ={yys-.5 ys X= SS} where x = p> x; 
(b) o =0.0228, 6 =0.0913 


c = 52.718, n=52 


Ay 

|x| 2 1.36 
Ho 
Ay 

|x| 2 1.923 
Ho 


H 
(a) |x] 21282; (6) B=06111 
0 


Ay 
|x| S 1.10 
Ho 


CHAPTER 9 


Queueing Theory 


9.1 Introduction 


Queueing theory deals with the study of queues (waiting lines). Queues 
abound in practical situations. The earliest use of queueing theory was in the 
design of a telephone system. Applications of queueing theory are found in 
fields as seemingly diverse as traffic control, hospital management, and 
time-shared computer system design. In this chapter, we present an 
elementary queueing theory. 


9.2 Queueing Systems 


A. Description: 


A simple queueing system is shown in Fig. 9-1. Customers arrive randomly 
at an average rate of 1. Upon arrival, they are served without delay if there 
are available servers; otherwise, they are made to wait in the queue until it is 
their turn to be served. Once served, they are assumed to leave the system. 
We will be interested in determining such quantities as the average number 
of customers in the system, the average time a customer spends in the 
system, the average time spent waiting in the queue, and so on. 


Arrivals Departures 
Queue Service 


Fig. 9-1 A simple queueing system. 


The description of any queueing system requires the specification of three 
parts: 


1. The arrival process 

2. The service mechanism, such as the number of servers and service-time 
distribution 

3. The queue discipline (for example, first-come, first-served) 


B. Classification: 


The notation 4 /B /s /K is used to classify a queueing system, where A 
specifies the type of arrival process, B denotes the service-time distribution, 
s specifies the number of servers, and K denotes the capacity of the system, 
that is, the maximum number of customers that can be accommodated. If K 
is not specified, it is assumed that the capacity of the system is unlimited. 
For example, an M/M /2 queueing system (M stands for Markov) is one with 
Poisson arrivals (or exponential interarrival time distribution), exponential 
service-time distribution, and 2 servers. An M /G/1 queueing system has 
Poisson arrivals, general service-time distribution, and a single server. A 
special case is the M/D/1 queueing system, where D stands for constant 
(deterministic) service time. Examples of queueing systems with limited 
capacity are telephone systems with limited trunks, hospital emergency 
rooms with limited beds, and airline terminals with limited space in which to 
park aircraft for loading and unloading. In each case, customers who arrive 
when the system is saturated are denied entrance and are lost. 


C. Useful Formulas: 

Some basic quantities of queueing systems are 
L: the average number of customers in the system 
L,: the average number of customers waiting in the queue 
L,: the average number of customers in service 


W: the average amount of time that a customer spends in the system 
W,: the average amount of time that a customer spends waiting in the 
queue 


W,: the average amount of time that a customer spends in service 


Many useful relationships between the above and other quantities of 
interest can be obtained by using the following basic cost identity: 

Assume that entering customers are required to pay an entrance fee 
(according to some rule) to the system. Then we have 


Average rate at which the syatemeams — 4, X average amount an entering customer pays (4.1) 


where J, is the average arrival rate of entering customers 


A, = lim ao 


t—> © t 


and X(t) denotes the number of customer arrivals by time ¢. 
If we assume that each customer pays $1 per unit time while in the 
system, Eq. (9.1) yields 


b= AW (9.2) 
Equation (9.2) is sometimes known as Littles formula. 


Similarly, if we assume that each customer pays $1 per unit time while in 
the queue, Eq. (9.1) yields 


L, = 4W, (9.3) 


If we assume that each customer pays $1 per unit time while in service, Eq. 
(9.1) yields 


L = 4,W, (9A) 


v 


Note that Eqs. (9.2) to (9.4) are valid for almost all queueing systems, 
regardless of the arrival process, the number of servers, or queueing 
discipline. 


9.3 Birth-Death Process 


We say that the queueing system is in state S,, if there are n customers in the 


system, including those being served. Let M(t) be the Markov process that 
takes on the value n when the queueing system is in state S,, with the 


following assumptions: 


1. If the system is in state S,,, it can make transitions only to S,,_ ; or S,,— 
1» 1 = 1; that is, either a customer completes service and leaves the 


system or, while the present customer is still being serviced, another 
customer atrives at the system; from So, the next state can only be S$}. 


2. If the system is in state S, at time ¢, the probability of a transition to S,, 
4 ,1n the time interval (¢, t+ Ad) is a, At. We refer to a, as the arrival 
parameter (or the birth parameter). 

3. If the system is in state S', at time ¢, the probability of a transition to S,, 
_, 1n the time interval (¢, ¢ + Ad) is d,, At. We refer to d,, as the 
departure parameter (or the death parameter). 


The process M(t) is sometimes referred to as the birth-death process. 
Let p,(t) be the probability that the queueing system is in state S,, at time 
t; that is, 


p(t) = P{N(O) = n} (9.5) 


Then we have the following fundamental recursive equations for M(t) (Prob. 
0.2): 


ria=--@ td pita, a, fo-d, pn il 
Pulte, Eye) Tap tt (3.0) 
Assume that in the steady state we have 

pub D,(t) = Py (9.7) 


and setting po(f) and p/(t) = 0 in Eqs. (9.6), we obtain the following steady- 
state recursive equation: 


(Ee Hh he abn Sl (9.8) 


| 
and for the special case with dy = 0, 
Oy: Pa = ay fi, (9.9) 


Equations (9.8) and (9.9) are also known as the steady-state equilibrium 
equations. The state transition diagram for the birth-death process is shown 
in Fig. 9-2. 


a, ay a, 7 a, 
cay (a) Ge: ee Om g i ¢ is , “a 
d. d., qd a 


Fig. 9-2 State transition diagram for the birth-death process. 


Solving Eqs. (9.8) and (9.9) in terms of po, we obtain 


dy 


1 — ?; 
P d, Po 
My Qy ; 
7. 9.10 
a did: ( ) 
> Ag Qy *** Ay —| 
Pn dd, a Po 
where fy can be determined from the fact that 
, . Apel \ 
37 * fig OO: GOP | jects (9.11) 
n= dy did, / 


provided that the summation in parentheses converges to a finite value. 


9.4 The M/M/1 Queueing System 


In the M/M/1 queueing system, the arrival process is the Poisson process 
with rate A (the mean arrival rate) and the service time is exponentially 
distributed with parameter yu (the mean service rate). Then the process M(t) 
describing the state of the M/M/1 queueing system at time ¢ is a birth-death 
process with the following state independent parameters: 


uo= A HZ d= m= | (9,12) 


Then from Eqs. (9.10) and (9.11), we obtain (Prob. 9.3) 


py ~1—-2-1-p (9.13) 
ul 
é \ f 7t 
D, -(1 moat 4) =(1—p)p" (9.14) 
‘ ft if \ u r, 


where p = A/u < 1, which implies that the server, on the average, must 
process the customers faster than their average arrival rate; otherwise, the 
queue length (the number of customers waiting in the queue) tends to 
infinity. The ratio p = A/u is sometimes referred to as the traffic intensity of 
the system. The traffic intensity of the system is defined as 


= : mean service time mean arrival rate 
Traffic intensity = AAA AA = 
mean interarrival time mean service rate 


The average number of customers in the system is given by (Prob. 9.4) 


A 
— (9.15) 
a—A 
Then, setting 2, = in Eqs. (9.2) to (9.4), we obtain (Prob. 9.5) 
W = _— a (9.16) 


bok Op) 


W, = a (9.17) 
aye Ad pl p) 


re ee 
4 w(u-aA) 1-p 


(9.18) 


9.5 The M/M/s Queueing System 


In the M/M/s queueing system, the arrival process is the Poisson process 
with rate A and each of the s servers has an exponential service time with 
parameter yw. In this case, the process M(t) describing the state of the M/M/s 
queueing system at time ¢ is a birth-death process with the following 
parameters: 

Au O<n<ls 
a= A n= as | (9.19) 


su = 8 


Note that the departure parameter d,, is state dependent. Then, from Eqs. 
(9.10) and (9.11) we obtain (Prob. 9.10) 


s—l, 
§ 5 
pro =| 5 SP, (9.20) 
Poy es s'(1— p) 
spy’ 
‘ _ Po ASS 
Pp, = - (9.21) 
pr e I N= § 


where p = 4 /(su) < 1. Note that the ratio p = 4 /(su) is the traffic intensity of 
the M/M/s queueing system. The average number of customers in the system 
and the average number of customers in the queue are given, respectively, by 
(Prob. 9.12) 


Ea Sp PY ap, (9.22) 
BB sith—py 


w== (9.24) 
A 
ta q r | 2 
W,=taw—4 (9.25) 
A Li 


9.6 The M/M/1/K Queueing System 


In the M/M/1/K queueing system, the capacity of the system is limited to K 
customers. When the system reaches its capacity, the effect is to reduce the 
arrival rate to zero until such time as a customer is served to again make 
queue space available. Thus, the M/M/1/K queueing system can be modeled 
as a birth-death process with the following parameters: 


[7 Q=an<k 


Ay =: 


i, = n= 9.26 
0 eK ft, — # (4.20) 


Then, from Eqs. (9.10) and (9.11) we obtain (Prob. 9.14) 


Le. FRG 


Pg = - pri (9.27) 

© aoa” tpt 

p,=(4| pp =P n=l... (9.28) 
Le | l-p 


where p = A/u. It is important to note that it is no longer necessary that the 
traffic intensity p = 2/u be less than 1. Customers are denied service when 


the system is in state K. Since the fraction of arrivals that actually enter the 
system is 1 — px, the effective arrival rate is given by 


A, = 1 — py) (9.29) 
The average number of customers in the system is given by (Prob. 9.15) 
1-(K +1)p* + Kp**! A 


; I — (9.30) 
er f 


L=p 3 -- 
ik pid—s ) lu 


Then, setting 2, =A, in Eqs. (9.2) to (9.4), we obtain 


An A(I Sot Pr ) : 
| s 
ais” (9.32) 
u 
L, _— AW, = A(l 7 Pr )W,, (9,33 ) 


9.7 The M/M/s/K Queueing System 


Similarly, the M/M/s/K queueing system can be modeled as a birth-death 
process with the following parameters: 


fA O=nik [rue CAS s 
il, i = { Ke = 4 [ t) : 
~ 10 ak |s u nzs [O.34) 


Then, from Eqs. (9.10) and (9.11), we obtain (Prob. 9.17) 


(9.35) 
a -| = 
Aa ol b deg 


Py NES 


= KK 
™ OM sSner 


(9.36) 


where p = A /(su). Note that the expression for p,, is identical in form to that 
for the M/M/s system, Eq. (9.21). They differ only in the pp term. Again, it is 


not necessary that p = A /(su) be less than 1. The average number of 


customers in the queue is given by (Prob. 9.18) 


( sy rket ns ai 


f "st (lo pr 


The average number of customers in the system is 


L ae oe 3 fi. tp) 
a: ‘dl 


The quantities W and W, are given by 


w="=7,+! 
A, " ft 
i, L 


= an 
7 2). Ad-—pr) 


(9.37) 


(9.38) 


(9,39) 


(940) 


SOLVED PROBLEMS 


9.1. Deduce the basic cost identity (9.1). 


Let T be a fixed large number. The amount of money earned by the 

system by time 7 can be computed by multiplying the average rate at 
which the system earns by the length of time 7. On the other hand, it 
can also be computed by multiplying the average amount paid by an 


entering customer by the average number of customers entering by 
time 7, which is equal to 1,7, where 4, is the average arrival rate of 


entering customers. Thus, we have 


Average rate at which the system earns x 7'= average amount paid 
by an entering customer * (4,7) 


Dividing both sides by 7 (and letting T — ©), we obtain Eq. (9.1). 
9.2. Derive Eq. (9.6). 


From properties 1 to 3 of the birth-death process M(t), we see that at 
time ¢ + At the system can be in state S,, in three ways: 


1. By being in state S, at time ¢ and no transition occurring in the time 
interval (¢, ¢ + A t). This happens with probability (1 —a, At) — d,, 
A t)~1-—(a,+d,) At [by neglecting the second-order effect 
a,d,(A t)’]. 

2. By being in state S,,_ , at time ¢ and a transition to S,, occurring in 
the time interval (¢, + A t). This happens with probability a, _ , At. 

3. By being in state S,,_ ; at time ¢ and a transition to S,, occurring in 
the time interval (¢, ¢+ A ft). This happens with probability d,_ , At. 

Let 


Pp, O= PINO = 1] 
Then, using the Markov property of M(t), we obtain 


pier A — [Pte Pep oa Ate. fb a. Ate 
pli lAnb= [1 fa, dj) Apu) ld, App 9 


Tal Se 
ii 1 if} iI | 


Ut 


Rearranging the above relations 


Py@i | At) pC) _ 
Al 7 

Pol + At) — py (O 
Af 


SF (a, a d,, Pn it) + a, —| P,,-\") i Bn Pn+ MO n=l 
—= (ay t+dy po +d py) 


Letting At — 0, we obtain 


(r) n=l 


Py Oe, FP eh Boe OY Fie Bie Gh 


w=] 


PO) = —dy + agp (+ d py 
9.3. Derive Eqs. (9.13) and (9.14). 


Setting a, =A, dy = 0, and d,, = w in Eq. (9.10), we get 
A 
P| ~— Po = PPo 
u 


2 
A 2 
P2 =|—| PoP Po 
u 


a)’ ; 
n-(4) Po =P Po 
u 


where Pp is determined by equating 


from which we obtain 


A 
Ll 


a a (oA 
Pn -(2| Po =a) = P)p [1 = | 
HM oe 


9.4. Derive Eq. (9.15). 


Since p,, 1s the steady-state probability that the system contains exactly 
n customers, using Eq. (9.14), the average number of customers in the 


M/M/1 queueing system is given by 


ia 


Lo s ip, — ¥ nl — pip" -ad-p)¥ no” 
n=O r r=) 


=U 
where p = A/u < 1. Using the algebraic identity 


wr 


x | 


? 
ax = — 
Ws 


n=O 


we obtain 


9.5. Derive Eqs. (9.16) to (9.18). 
Since 1, =A, by Eqs. (9.2) and (9.15), we get 


w-L-_! ._! 
A u-A wp) 


which is Eq. (9.16). Next, by definition, 
W = W-W, 


(U.41) 


(9.42) 


(9.43) 


9.6. 


where W, = 1/u, that is, the average service time. Thus, 


. l l A p 
W, = — = 
* g-* Bw ped) n=p) 


which is Eq. (9.17). Finally, by Eq. (9.3), 


2 2 
L, = AW, =—*— = p 
u(u—A) 1—p 


which is Eq. (9.18). 


Let W,, denote the amount of time an arbitrary customer spends in the 
M/M/1 queueing system. Find the distribution of W,,. 


We have 


— 


MM a} », Priv, = a| fin the system when the customer arrives} 


acu 
Pirin the system when the customer arrives} (9.44) 


where n is the number of customers in the system. Now consider the 
amount of time W,, that this customer will spend in the system when 


there are already n customers when he or she arrives. When n = 0, then 
W, = W.q), that is, the service time. When n > 1, there will be one 


customer in service and 1 — | customers waiting in line ahead of this 
customer’s arrival. The customer in service might have been in service 
for some time, but because of the memoryless property of the 
exponential distribution of the service time, it follows that (see Prob. 
2.58) the arriving customer would have to wait an exponential amount 
of time with parameter ju for this customer to complete service. In 
addition, the customer also would have to wait an exponential amount 
of time for each of the other m — 1 customers in line. Thus, adding his 
or her own service time, the amount of time W,, that the customer will 


spend in the system when there are already n customers when he or 


she arrives is the sum of 7 + | 1id exponential r. v.’s with parameter w. 
Then by Prob. 4.39, we see that this r. v. is a gamma r. v. with 
parameters (n + 1, 4). Thus, by Eq. (2.65), 


ut )" 
= dt 


a 
ba _— . . t 
Bs al in the system when customer arrives} = fi , He e 
n! 


From Eq. (9.14), 


a7 
. . A A 
P{n in the system when customer arrives} = p,, = (i = — 


Hence, by Eq. (9.44), 


= a} vig 8 
Be = PLW. <=: a} Sy ‘ J, ue * (ult l A | A | it 
: all mt 1 ad Ah ey 
: eA 
—[ofu-Aje ™ samme 1 
‘ti 2 1! 
[eae Bee a gee (949) 


Thus, by Eq. (2.61), W,, is an exponential r. v. with parameter yu = /. 
Note that from Eq. (2.62), E(W,) = 1/(u- 2), which agrees with Eq. 
(9.16), since W= E(W,,). 


9.7. Customers arrive at a watch repair shop according to a Poisson 
process at a rate of one per every 10 minutes, and the service time is 
an exponential r.v. with mean 8 minutes. 


(a) Find the average number of customers L, the average time a 
customer spends in the shop W, and the average time a customer 
spends in waiting for service W,. 


(6) Suppose that the arrival rate of the customers increases 10 percent. 
Find the corresponding changes in L, W, and W,,. 


(a) The watch repair shop service can be modeled as an M/M/1 
queueing system with A = =: u= 7 Thus, from Eqs. (9.15), (9.16), 


and (9.43), we have 


L= A =) =f 
u-A Lot 
8 10 
I I 
W =— = —— = £9 minutes 
oY ae 
8 10 


W, =W —W, =40 —8 = 32 minutes 


(b) Now A= 5 u= a Then 


1 
ga F 4 
8 9 
1 1 
= ——__ = ——_. = 72 minutes 
u-A 1d 
8 9 


W, =W —W, =72 —8 = 64 minutes 


It can be seen that an increase of 10 percent in the customer arrival 
rate doubles the average number of customers in the system. The 
average time a customer spends in queue is also doubled. 


9.8. A drive-in banking service is modeled as an M/M/1 queueing system 
with customer arrival rate of 2 per minute. It is desired to have fewer 
than 5 customers line up 99 percent of the time. How fast should the 


service rate be? 


From Eq. (9.14), 


P{5 


9.9. 


oc oc ; j A 
or more customers in the system} = s Pa= > 1—p)p" =p p= 


a n 3 # 


In order to have fewer than 5 customers line up 99 percent of the time, 
we require that this probability be less than 0.01. Thus, 


5 
0° -(4 <(01 
u 


from which we obtain 


A> 35 
x 2 
uw =—_—= = 3200 or u= 5.024 
0.01 0.01 
Thus, to meet the requirements, the average service rate must be at 
least 5.024 customers per minute. 


People arrive at a telephone booth according to a Poisson process at an 
average rate of 12 per hour, and the average time for each call is an 
exponential r.v. with mean 2 minutes. 


(a) What is the probability that an arriving customer will find the 
telephone booth occupied? 


(5) It is the policy of the telephone company to install additional 
booths if customers wait an average of 3 or more minutes for the 


phone. Find the average arrival rate needed to justify a second 
booth. 


(a) The telephone service can be modeled as an M/ M/ | queueing 
system with A = : a . and p = A/u= . The probability that an 
arriving customer will find the telephone occupied is P(L > 0), 


where L is the average number of customers in the system. Thus, 
from Eq. (9.13), 


P(L>0)=1- py =!-(l- p)=p= 


(b) From Eq. (9.17), 


nN A 
o> — I __e 
7  p(u-A) 05(00.5—-A) 


from which we obtain A > 0.3 per minute. Thus, the required average 
arrival rate to justify the second booth is 18 per hour. 


9.10. Derive Eqs. (9.20) and (9.21). 
From Eqs. (9.19) and (9.10), we have 


1 ° il 
fe A| | th Ag 
=7) = }} prautss rt i O46) 
Pa = Po I] Ch +1 Pn ee HI! 


Pd | is n—l 


Se ee lel 


Let p = A/(su). Then Eqs. (9.46) and (9.47) can be rewritten as 


a 4), aT SE (O.47) 


n 
K) 
oe) Po n<s 
Px p"s® 
I Po n=s 


which is Eq. (9.21). From Eq. (9.11), pp is obtained by equating 


x Pu = Po 5 oP +e = = 


Using the summation formula 


oo & 


> = = x. 3 


<1 (9.48) 


9.11. 


» 
P(a customer is forced to join queuc) = y p,, =P ae p= Pat ie 


9.12. 


we obtain Eq. (9.20); that is, 


h thee \ : 
_[Q Gy” (spy, _ (spy 
Po y n! ~ Se" =| oe" n! stl) 


n=0 “n= = 


provided p = A/(su) < 1. 


Consider an M / M /s queueing system. Find the probability that an 
arriving customer is forced to join the queue. 


An arriving customer is forced to join the queue when all servers are 
busy—that is, when the number of customers in the system is equal to 
or greater than s. Thus, using Eqs. (9.20) and (9.21), we get 


— p) 
(spy 
ire 
=) (9.49) 
(spy, (spy 
ft nl si(1— p) 


Equation (9.49) is sometimes referred to as Erlang’s delay (or C) 
formula and denoted by C(s, 4 /w). Equation (9.49) is widely used in 
telephone systems and gives the probability that no trunk (server) is 
available for an incoming call (arriving customer) in a system of s 
trunks. 


Derive Eqs. (9.22) and (9.23). 


Equation (9.21) can be rewritten as 


Then the average number of customers in the system is 


a el ad 


L=E(N)= s Pn = Po 


n=0 n=0 n=s+l 
=p sp yer eo . np" 
3 @—Dr sk A, 
(5 »)" n . n 
= po| sp y SET + | np — 5 np 
a=0 saa a=0 n=0 
Using the summation formulas, 
= 4% ; ; 
S mn = = |x|<1 (9.50) 
= dt 2 
kA Dae 
pee Nel Bee (9.51) 
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and Eq. (9.20), we obtain 


4— (s Va s : [s stl _ 5 +1) y+] 
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Next, using Eqs. (9.21) and (9.50), the average number of customers in 
the queue is 


1s 


Ly = > inm—s)p, = S (n = jf : Po 


Ss 
n=s 
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9.13. A corporate computing center has two computers of the same capacity. 
The jobs arriving at the center are of two types, internal jobs and 
external jobs. These jobs have Poisson arrival times with rates 18 and 
15 per hour, respectively. The service time for a job is an exponential 
r.v. with mean 3 minutes. 


(a) Find the average waiting time per job when one computer is used 
exclusively for internal jobs and the other for external jobs. 


(b) Find the average waiting time per job when two computers handle 
both types of jobs. 


(a) When the computers are used separately, we treat them as two 
M/M/1 queueing systems. Let W,, and W,, be the average waiting 


time a oe EOP and pe external job, respectively. For internal 
jobs, A, = == == and | MW, =s 3 Then, from Eq. (9.16), 


For external jobs, A, = = = 7, and = A , and 


(6) When two computers handle both types of jobs, we model the 
computing service as an M/M/2 queueing system with 


+15 11 | 
= mr pS pe 
60 20 3 21 


Now, substituting s = 2 in Eqs. (9.20), (9.22), (9.24), and (9.25), we 
get 


2pr |] _1- 
Po =|1+2p+ f eS = E (9.52) 
wip) ‘Bac 
4p° {1-p' 2p 
L-2p+—P _| a )-<f. (9.53) 
2p ep Fe 
Lk @ I 
W,=2-—=_—? -- (9.54) 
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Thus, from Eq. (9.54), the average waiting time per job when both 
computers handle both types of jobs is given by 


From these results, we see that it is more efficient for both computers 
to handle both types of jobs. 


9.14. Derive Eqs. (9.27) and (9.28). 


From Eqs. (9.26) and (9.10), we have 
Dy, -|=] Py —P Po Qfn=2K (9.53) 
, 
From Eq. (9.11), pp is obtained by equating 
K K 
S) Pn = Po > p" =] 
n=0 n=0 


Using the summation formula 


K _ e+ 
es ae (9,56) 
n=0 1~x 

we obtain 

_ rp _ (—p)p" 
Po kT and 0 kat 


Note that in this case, there is no need to impose the condition that p = 
Al/w< i. 


9.15. Derive Eq. (9.30). 


Using Eqs. (9.28) and (9.51), the average number of customers in the 
system is given by 


K —p & 
L=E(N)= » np, = 1-o*k*t S ng” 
n=0 P n=O 
__l-p_ felKpt*! -(K +bp* +1) 
1— p**! (1 —py 
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9.16. Consider the M/M/1/K queueing system. Show that 


] 

W,=—L (9.58) 
u 

w= ES (9.59) 
Lt 


In the M/M/1/K queueing system, the average number of customers in 
the system is 


K 


a 
L=E(N)= > np, and ba B= 


na=0 n=0 
The average number of customers in the queue is 
K K K 
L,~EW,)= ¥-Dpa= Ya), Pa = LA Bo) 
n=l n=0 n=l 


Acustomer arriving with the queue in state S,, has a wait time 7; that is 
the sum of 7 independent exponential r. v.’s, each with parameter yu. 


The expected value of this sum is n/u [Eq. (4.132)]. Thus, the average 
amount of time that a customer spends waiting in the queue is 


W. =E(T.)= ye, oL® =A» 
a Ta) Sea Ln, 


Similarly, the amount of time that a customer spends in the system is 


n=| 


S (atl) li = : 
W=£0)= s ? Pn “hi > np, + ps Pn 


_ ee +5) 
n=(0) n=) M 


Note that Eqs. (9.57) to (9.59) are equivalent to Eqs. (9.31) to (9.33) 
(Prob. 9.27). 


9.17. Derive Eqs. (9.35) and (9.36). 


As in Prob. 9.10, from Eqs. (9.34) and (9.10), we have 
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Let p =/ /(su). Then Eqs. (9.60) and (9.61) can be rewritten as 


ae n<s 
_ n! 
Pa 


which is Eq. (9.36). From Eq. (9.11), po is obtained by equating 


Using the summation formula (9.56), we obtain 
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which is Eq. (9.35). 
9.18. Derive Eq. (9.37). 


Using Eq. (9.36) and (9.51), the average number of customers in the 
queue is given by 
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9.19. Consider an M/M/s/s queueing system. Find the probability that all 
servers are busy. 


Sp. Pa 


a~d 


9.20. 


Setting K = s in Eqs. (9.60) and (9.61), we get 


‘ Ji 

A ] a * 

Pn — P| | = QSunss (9.62) 
ul} an! 


and pp is obtained by equating 


| : | i} 


=)" (9.63) 


Thus, 


The probability that all servers are busy is given by 


aN 
aj A | sai (A fu y fst = 
ehh | =. (9.64) 
u | 5! , sie 
ee S Alp) ip! 
a=0 


Note that in an M/M/s/s queueing system, if an arriving customer finds 
that all servers are busy, the customer will turn away and is lost. In a 
telephone system with s trunks, p, is the portion of incoming calls 


which will receive a busy signal. Equation (9.64) is often referred to as 
Erlang ’s loss (or B) formula and is commonly denoted as B(s, A /u). 


An air freight terminal has four loading docks on the main concourse. 

Any aircraft which arrive when all docks are full are diverted to docks 

on the back concourse. The average aircraft arrival rate is 3 aircraft per 

hour. The average service time per aircraft is 2 hours on the main 

concourse and 3 hours on the back concourse. 

(a) Find the percentage of the arriving aircraft that are diverted to the 
back concourse. 


(b) If a holding area which can accommodate up to 8 aircraft is added 
to the main concourse, find the percentage of the arriving aircraft 
that are diverted to the back concourse and the expected delay time 
awaiting Service. 


(a) The service system at the main concourse can be modeled as an 
M/M/s/s queueing system with s=4,2=3,u= . and A /u = 6. 
The percentage of the arriving aircraft that are diverted to the back 
concourse is 


100 x P(all docks on the main concourse are full) 
From Eq. (9.64). 


; ‘ 6° /4! 54 
P(all docks on the main concourse are full) = py =—————— = —— = 0,47 


>. (6" Sint) 


a 


Thus, the percentage of the arriving aircraft that are diverted to the 
back concourse is about 47 percent. 

(b) With the addition of a holding area for 8 aircraft, the service 
system at the main concourse can now be modeled as an M/M/s/K 
queueing system with s = 4, K = 12, and p = /(su) = 1.5. Now, 
from Eqs. (9.35) and (9.36), 


m —l 
3 n 4 _14«? 
y= oot! I> || ~ 0.00024 
fant 4tli-135 
15" 4" 
fe AN Py ~ 0.332 


Thus, about 33.2 percent of the arriving aircraft will still be 
diverted to the back concourse. 

Next, from Eq. (9.37), the average number of aircraft in the 
queue is 


L, =0 00024 1569 _ —[1+(1-1.5)8](1.5)*} = 6.0565 

“g ‘ =— 4\(1—1.5) L = oat J ‘ - - 
Then, from Eq. (9.40), the expected delay time waiting for service 
is 


+ L, _ = _ 6.0565 
' Ri—pal 30 —0392) 


=~ 3.022 hours 

Note that when the 2-hour service time is added, the total expected 
processing time at the main concourse will be 5.022 hours 
compared to the 3-hour service time at the back concourse. 


SUPPLEMENTARY PROBLEMS 


9.21. Customers arrive at the express checkout lane in a supermarket in a 
Poisson process with a rate of 15 per hour. The time to check out a 
customer is an exponential r. v. with mean of 2 minutes. 

(a) Find the average number of customers present. 
(b) What is the expected idle delay time experienced by a customer? 
(c) What is the expected time for a customer to clear a system? 


9.22. Consider an M/M/1 queueing system. Find the probability of finding 
at least k customers in the system. 


p. 


9.23. In a university computer center, 80 jobs an hour are submitted on the 
average. Assuming that the computer service is modeled as an M/M/1 
queueing system, what should the service rate be if the average 
turnaround time (time at submission to time of getting job back) is to 
be less than 10 minutes? 


p. 


9.24. 


9.25. 


9.26. 


ee 


9.28. 


9.29. 


The capacity of a communication line is 2000 bits per second. The line 
is used to transmit 8-bit characters, and the total volume of expected 
calls for transmission from many devices to be sent on the line is 
12,000 characters per minute. Find (a) the traffic intensity, (5) the 
average number of characters waiting to be transmitted, and (c) the 
average transmission (including queueing delay) time per character. 


Abank counter is currently served by two tellers. Customers entering 
the bank join a single queue and go to the next available teller when 
they reach the head of the line. On the average, the service time for a 
customer is 3 minutes, and 15 customers enter the bank per hour. 
Assuming that the arrivals process is Poisson and the service time is an 
exponential r. v., find the probability that a customer entering the bank 
will have to wait for service. 


Apost office has three clerks serving at the counter. Customers arrive 
on the average at the rate of 30 per hour, and arriving customers are 
asked to form a single queue. The average service time for each 
customer is 3 minutes. Assuming that the arrivals process is Poisson 
and the service time is an exponential r. v., find (a) the probability that 
all the clerks will be busy, (6) the average number of customers in the 
queue, and (c) the average length of time customers have to spend in 
the post office. 


Show that Eqs. (9.57) to (9.59) and Eqs. (9.31) to (9.33) are 
equivalent. 


Find the average number of customers L in the M/M/1/K queueing 
system when / = u. 


Agas station has one diesel fuel pump for trucks only and has room for 
three trucks (including one at the pump). On the average, trucks arrive 


9.30. 


at the rate of 4 per hour, and each truck takes 10 minutes to service. 

Assume that the arrivals process is Poisson and the service time is an 

exponential r. v. 

(a) What is the average time for a truck from entering to leaving the 
station? 


(b) What is the average time for a truck to wait for service? 
(c) What percentage of the truck traffic is being turned away? 


Consider the air freight terminal service of Prob. 9.20. How many 
additional docks are needed so that at least 80 percent of the arriving 
aircraft can be served in the main concourse with the addition of 
holding area? 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


Mel. 


9.22. 


9.23. 


9.24. 


9.25. 


9.26. 


(a) 1; 
(b) 2 min; 
(c) 4 min 


pk = (a iu)" 
1.43 jobs per minute 


(a) 0.8; 
(DB) 3.25 
(c) 20 ms 


0.205 


(a) 0.237: 
(b) 0.237: 
(c) 3.947 min 


9.27 .. Hint: Use Eq, (9.29). 
9.28. K/2 


9.29. (a) 20.15 min; 
(b) 10.14 min; 
(c) 12.3 percent 


9.30. 4 


CHAPTER 10 


Information Theory 


10.1 Introduction 


Information theory provides a quantitative measure of the information contained in 
message signals and allows us to determine the capacity of a communication system 
to transfer this information from source to destination. In this chapter we briefly 
explore some basic ideas involved in information theory. 


10.2 Measure of Information 


A. Information Sources: 


An information source is an object that produces an event, the outcome of which is 
selected at random according to a probability distribution. A practical source in a 
communication system is a device that produces messages, and it can be either 
analog or discrete. In this chapter we deal mainly with the discrete sources, since 
analog sources can be transformed to discrete sources through the use of sampling 
and quantization techniques. A discrete information source is a source that has only 
a finite set of symbols as possible outputs. The set of source symbols is called the 
source alphabet, and the elements of the set are called symbols or letters. 

Information sources can be classified as having memory or being memoryless. A 
source with memory is one for which a current symbol depends on the previous 
symbols. A memoryless source is one for which each symbol produced is 
independent of the previous symbols. 

A discrete memoryless source (DMS) can be characterized by the list of the 
symbols, the probability assignment to these symbols, and the specification of the 
rate of generating these symbols by the source. 


B. Information Content of a Discrete Memoryless Source: 


The amount of information contained in an event is closely related to its uncertainty. 
Messages containing knowledge of high probability of occurrence convey relatively 
little information. We note that if an event is certain (that is, the event occurs with 
probability 1), it conveys zero information. 

Thus, a mathematical measure of information should be a function of the 
probability of the outcome and should satisfy the following axioms: 


1. Information should be proportional to the uncertainty of an outcome. 
2. Information contained in independent outcomes should add. 


1. Information Content of a Symbol: 
Consider a DMS, denoted by_X, with alphabet {x,, x5, ..., x,,}. The information 
content of a symbol x,, denoted by /(x,), is defined by 


f(x; ) =log,, Pas = log, P(X) (10.1) 


i< : 
i 


where P(x;) is the probability of occurrence of symbol x;. Note that J(x;) satisfies the 
following properties: 


(x) =.0 for Pix) = 1 (10.2) 
Kx) = 0 (10.3) 
Iw.) > Kx,) if P(x.) = Pl x) (10.4) 
Ax. x;) = AK) Lx;) it x. and xX, are independent (10.5) 


The unit of /(x;) is the bit (binary unit) if b = 2, Hartley or decit if b = 10, and nat 
(natural unit) if b = e. It is standard to use b = 2. Here the unit bit (abbreviated “b’’) 
is a measure of information content and is not to be confused with the term Dit 
meaning “binary digit.” The conversion of these units to other units can be achieved 
by the following relationships. 


_ Ina _ loga 


logou (10.6) 


~In2— lo g2 
2. Average Information or Entropy: 

In a practical communication system, we usually transmit long sequences of 
symbols from an information source. Thus, we are more interested in the average 
information that a source produces than the information content of a single symbol. 


The mean value of /(x,) over the alphabet of source X with m different symbols is 
given by 


Wl 
H(X)— AL aN SY POG) 
iI 
: (10.7) 
= ¥ PU dlog, Pix) bésymbatl 


The quantity H(X) is called the entropy of source X. It is a measure of the average 
information content per source symbol. The source entropy H(X) can be considered 
as the average amount of uncertainty within source X that is resolved by use of the 
alphabet. 

Note that for a binary source X that generates independent symbols 0 and | with 
equal probability, the source entropy H(X) is 


l 
y 


. I I | 
H(X)——— log, gay log, —~—1 b/symbol (10.8) 


The source entropy H(X) satisfies the following relation: 
0 = A(X) = log, m (10.9) 


where m is the size (number of symbols) of the alphabet of source X (Prob. 10.5). 
The lower bound corresponds to no uncertainty, which occurs when one symbol has 
probability P(x;) = 1 while P(x;) = 0 for j # i, so X emits the same symbol x; all the 
time. The upper bound corresponds to the maximum uncertainty which occurs when 
P(x;) = 1/m for all i—that is, when all symbols are equally likely to be emitted by X. 


3. Information Rate: 


If the time rate at which source _X emits symbols is r (symbols/s), the information 
rate R of the source is given by 


R=rH(X) dis (10.10) 


10.3 Discrete Memoryless Channels 


A. Channel Representation: 


A communication channel is the path or medium through which the symbols flow to 
the receiver. A discrete memoryless channel (DMC) 1s a statistical model with an 
input _X and an output Y (Fig. 10-1). During each unit of the time (signaling 
interval), the channel accepts an input symbol from_X, and in response it generates 
an output symbol from Y. The channel is “discrete” when the alphabets of X and Y 
are both finite. It is “memoryless” when the current output depends on only the 
current input and not on any of the previous inputs. 


a ® oy 
Xn ® a 
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x Ply, 1x; Y 
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Fig. 10-1 Discrete memoryless channel. 


A diagram of a DMC with m inputs and n outputs is illustrated in Fig. 10-1. The 
input X consists of input symbols x, x5, ..., x,,. The a priori probabilities of these 


source symbols P(x;) are assumed to be known. The output Y consists of output 
symbols y,, y3, ..., ¥,. Each possible input-to-output path is indicated along with a 
conditional probability P(y,|x;), where P(y;|x;) 1s the conditional probability of 
obtaining output y; given that the input is x;, and is called a channel transition 
probability. 


B. Channel Matrix: 


A channel is completely specified by the complete set of transition probabilities. 
Accordingly, the channel of Fig. 10-1 is often specified by the matrix of transition 
probabilities [P(Y|_X)], given by 


P(y|x) Py |x) PCY |a) 
P(y,|42) POyy|a2) 2 PCs, | 49) 


[PO |x) = (10.11) 


P{ vy | Ky) P(y> any ) ae P(¥,, | Xn) 


The matrix [P(Y| X)] is called the channel matrix. Since each input to the channel 
results in some output, each row of the channel matrix must sum to unity; that is, 


S PCy; |x, )=1 forall (10,12) 
in] 
Now, if the input probabilities P(X) are represented by the row matrix 
[PX = (PQ) PG)... P&,)] (10,13) 
and the output probabilities P(Y) are represented by the row matrix 
POU=Iee): POF «» PR (10,14) 
then 
[PY] = [POOPY | XO] (10.15) 


If P(X) is represented as a diagonal matrix 


Piys OG wm 0 
— O P(X) 0 
[P(X = (10.16) 
) 0) . Pr, ' 
then 
[P(X, ¥)] = [P(X [PY | X)] (10.17) 


where the (7, /) element of matrix [P(X, Y)] has the form P(x;, y;). The matrix [P(X, 
Y)] is known as the joint probability matrix, and the element P(x, y;) is the joint 
probability of transmitting x; and receiving y,. 


C. Special Channels: 


1. Lossless Channel: 

A channel described by a channel matrix with only one nonzero element in each 
column is called a lossless channel. An example of a lossless channel is shown in 
Fig. 10-2, and the corresponding channel matrix is shown in Eq. (10.18). 


a + ] 
So BM OG 
4 4 
a 1 2 
[Pi¥| Xy]=|0 0 a5 0 (10.18) 
000 0 1 


Fig. 10-2 Lossless channel. 


It can be shown that in the lossless channel no source information is lost in 
transmission. [See Eq. (10.35) and Prob. 10.10.] 


2. Deterministic Channel: 

A channel described by a channel matrix with only one nonzero element in each row 
is called a deterministic channel. An example of a deterministic channel is shown in 
Fig. 10-3, and the corresponding channel matrix 1s shown in Eq. (10.19). 


5 
Fig. 10-3 Deterministic channel. 
10 0 
1 0 0 
[PY|X)]-]O 1 0 (10.19) 
01 0 
00 1 


Note that since each row has only one nonzero element, this element must be unity 
by Eq. (10.12). Thus, when a given source symbol 1s sent in the deterministic 
channel, it is clear which output symbol will be received. 


3. Noiseless Channel: 


A channel is called noiseless if it is both lossless and deterministic. A noiseless 
channel is shown in Fig. 10-4. The channel matrix has only one element in each row 
and in each column, and this element is unity. Note that the input and output 
alphabets are of the same size; that is, m =n for the noiseless channel. 


1 
xX, e——_________p»___________ ey, 


Fig. 10-4 Noiseless channel. 


4. Binary Symmetric Channel: 


The binary symmetric channel (BSC) is defined by the channel diagram shown in 
Fig. 10-5, and its channel matrix is given by 


: =p  ?p 
[P(Y| X)] | | (10.20) 
p> Lop 


The channel has two inputs (x, = 0, x, = 1) and two outputs (y; = 0, y, = 1). The 
channel is symmetric because the probability of receiving a | if a 0 is sent is the 
same as the probability of receiving a 0 if a 1 is sent. This common transition 
probability is denoted by p. 


1-p 


1-p 


Fig. 10-5 Binary symmetrical channel. 


10.4 Mutual Information 


A. Conditional and Joint Entropies: 


Using the input probabilities P(x;), output probabilities P(y,), transition probabilities 
P(y; | x;), and joint probabilities P(x;, y;), we can define the following various 


entropy functions for a channel with m inputs and n outputs: 


i 
H(X)=—- b P(x, log, P(x; ) (10,21) 
ist 
H(Y)=— YP, log: PCy,) (10,22) 
s=1 


H(X Y= pS P(X; .¥; Nog, P(X, ¥;) (10,23) 
| 


#5, Ace 


HY|X)=— YY Pa; ¥, log, PO; |4) (10,24) 
ie oa | 

A(X, Y)=- yy y P(X;,¥; log, P(X;,¥;) (10.25) 
an | 


These entropies can be interpreted as follows: H(X) is the average uncertainty of the 
channel input, and H(Y) is the average uncertainty of the channel output. The 
conditional entropy H(X| Y) is a measure of the average uncertainty remaining about 
the channel input after the channel output has been observed. And H(X|Y) is 
sometimes called the equivocation of X with respect to Y. The conditional entropy 
H(¥|X) is the average uncertainty of the channel output given that XY was transmitted. 
The joint entropy H(X, Y) is the average uncertainty of the communication channel 
as a whole. 

Two useful relationships among the above various entropies are 


A(X, Y) = A(X 


¥)}+ ACY) (10.26) 
HX, ¥) = A(Y 


X) + H{X) (10.27) 


Note that if X and Y are independent, then 


HX |¥) = A(X) (10.28) 
HY |X) = H(Y) (10.29) 
W(X, Y) = W(X) — IY) (10.30) 


B. Mutual Information: 


The mutual information I(X; Y) of a channel is defined by 
(X,Y) = A(X) -— AX |Y) b/symbol (10.31) 


Since H(X) represents the uncertainty about the channel input before the channel 
output is observed and H(X|Y) represents the uncertainty about the channel input 
after the channel output is observed, the mutual information /(X; Y) represents the 
uncertainty about the channel input that is resolved by observing the channel output. 


Properties of [(X; Y): 


1 AR Y= RM (132) 


CAs PS bea C334 
3 WEY) = AY) - A |X) (13d; 
doy) W001 uy Wey 1.354 
mee etal =a FX anc are iniependent AG 


Note that from Eqs. (10.31), (10.33), and (10.34) we have 
6, AX LS ALP Way 
7. UNE) = ANY |X) [WSR 


with equality if X and Y are independent. 


C. Relative Entropy: 
The relative entropy between two pmf’s p(x;), and g(x;) on X is defined as 


D(piiq) = rca 0g, (10.39) 


The relative entropy, also known as Kullback-Leibler divergence, measures the 
“closeness” of one distribution from another. It can be shown that (Prob. 10.15) 


D(p |! q) = 0 (10.40) 


and equal zero if p(x;) = q(x;). Note that, in general, it is not symmetric, that is, D(p 


!q)#D(q !'p). 
The mutual information /(X; Y) can be expressed as the relative entropy between 
the joint distribution pyy (x, y) and the product of distribution p,{x) py (y); that is, 


(X.Y) = Dpyy G, ¥) fp, GO py OY) (10.41) 


= Y py (1:.9;)logs Px (Yi) (10.42) 
x yy 


Px(%;) Py(¥;) 


10.5 Channel Capacity 
A. Channel Capacity per Symbol C,: 


The channel capacity per symbol of a DMC is defined as 


a 


C.— max I(X;Y) b/symbol (10.43) 
{P53} ED, 


where the maximization is over all possible input probability distributions {P(x;)} 
on X. Note that the channel capacity C, is a function of only the channel transition 
probabilities that define the channel. 


B. Channel Capacity per Second C: 


If r symbols are being transmitted per second, then the maximum rate of 
transmission of information per second is rC,. This is the channel capacity per 


second and is denoted by C (b/ s): 
C= r, b/s (10.44) 


C. Capacities of Special Channels: 


1. Lossless Channel: 
For a lossless channel, H(X|Y) = 0 (Prob. 10.12) and 


KX; Y) = HX) (10.45) 


Thus, the mutual information (information transfer) is equal to the input (source) 
entropy, and no source information is lost in transmission. Consequently, the 
channel capacity per symbol is 


C,= max H(X)=log, m (10.46) 


(PO; 
where m is the number of symbols in_X. 


2. Deterministic Channel: 
For a deterministic channel, H(Y|X) = 0 for all input distributions P(x;), and 


KX; ¥) = HY) (10.47) 


Thus, the information transfer is equal to the output entropy. The channel capacity 
per symbol is 


C, = max H(Y)=log,n (10.48) 
{Pf xX: }} 


where n is the number of symbols in Y. 


3. Noiseless Channel: 
Since a noiseless channel is both lossless and deterministic, we have 
[(X, Y) = A(X) = AY) (10.49) 
and the channel capacity per symbol is 
C, = log, m = log,n (10,50) 
4. Binary Symmetric Channel: 
For the BSC of Fig. 10-5, the mutual information is (Prob. 10.20) 
KX, ¥) — HY) | plog,p | (1 p)logsl — p) (10.51) 
and the channel capacity per symbol is 


C= 1+ plog,p st (1 —p)log,(1— p) (10.52) 


10.6 Continuous Channel 


In a continuous channel an information source produces a continuous signal x(f). 
The set of possible signals is considered as an ensemble of waveforms generated by 
some ergodic random process. It is further assumed that x(f) has a finite bandwidth 
so that x(7) is completely characterized by its periodic sample values. Thus, at any 
sampling instant, the collection of possible sample values constitutes a continuous 
random variable X described by its probability density function f(x). 


A. Differential Entropy: 


The average amount of information per sample value of x(t) is measured by 
IW(X)=— f. fy( logs fy(o dr — b/sample (10,53) 


The entropy defined by Eq. (10.53) is known as the differential entropy of a 
continuous r.v. X with pdf f(x). Note that as opposed to the discrete case (see Eq. 


(10.9)), the differential entropy can be negative (see Prob. 10.24). 


Properties of Differential Entropy: 
(li FG 1 ry - TAX (1084) 


Translation does not change the differential entropy. Equation (10.54) follows 
directly from the definition Eq. (10.53). 


(3) AleX) = WY = bow, (10,85) 


Equation (10.55) is proved in Prob. 10.29. 


B. Mutual Information: 


The average mutual information in a continuous channel is defined (by analogy with 
the discrete case) as 


KX; Y) = HX) — HX |¥) (10.56) 
or 
KX; ¥) = H(Y) — H(Y |X) (10.57) 
where 
HY)-—f fy, O)logs Fy) dy (10.58) 
H(X|¥)=— i= Te Fey (X.¥)logs fy a] y) de ay (10.59) 
Hiy|X)-— ff i f . Suey Qa Nlogs fy (|x) dx dy (10.60) 
C. Relative Entropy: 


Similar to the discrete case, the relative entropy (or Kullback-Leibler divergence) 
between two pdf’s fy (x) and g,{x) on continuous r.v. X is defined by 


; 


ti a | she (10.61) 
8x x, } 
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From this definition, we can express the average mutual information /(X; Y) as 


AX. ¥) = DCfyy OV) Fy GO) fy OD) (10.62) 
=f ia Fry (4), ¥; logs For ( 3) ax dy (10.63) 
ty fla Ly (¥;) (; 


D. Properties of Differential Entropy, Relative Entropy, and Mutual 
Information: 


1. Inf eved (Ltd) 
with equality iff f = g. (See Prob. 10.30.) 

1 WE FE a (1.65) 
3, HUNG: BUSY (10,86) 
4. INP) = IN X (067i 


with equality iff X and Y are independent. 


10.7 Additive White Gaussian Noise Channel 
A. Additive White Gaussian Noise Channel: 
An additive white Gaussian noise (AWGN) channel is depicted in Fig. 10-6. 


Z; 


Fig. 10-6 Gaussian channel. 


This is a time discrete channel with output Y; at time 7, where Y; is the sum of the 
input X; and the noise Z;. 


¥,=X.+Z, (10.68) 


The noise Z; is drawn 1.1.d. from a Gaussian distribution with zero mean and 
variance N. The noise Z; is assumed to be independent of the input signal _X;. We 
also assume that the average power of input signal is finite. Thus, 


F(X?) = § (10,69) 
E(Z2) = N (10.70) 


The capacity C, of an AWGN channel is given by (Prob. 10.31) 


C,= max I(X;Y )= Slog, 


iyi 


pape | 0.71) 
where S/N is the signal-to-noise ratio at the channel output. 


B. Band-Limited Channel: 


A common model for a communication channel is a band-limited channel with 
additive white Gaussian noise. This is a continuous-time channel. 


Nyquist Sampling Theorem 

Suppose a signal x(f) is band-limited to B; namely, the Fourier transform of the 
signal x(t) is 0 for all frequencies greater than B(Hz). Then the signal x(Z) is 
completely determined by its periodic sample values taken at the Nyquist rate 2B 
samples/sec. (For the proof of this sampling theorem, see any Fourier transform 
text.) 


C. Capacity of the Continuous-Time Gaussian Channel: 


If the channel bandwidth B(Hz) is fixed, then the output y(Z) is also a band-limited 
signal. Then the capacity C(b/s) of the continuous-time AWGN channel is given by 
(see Prob. 10.32) 


f 


C= = Blogs| 


= +)» (10.72) 


Equation (10.72) is known as the Shannon-Hartley law. 

The Shannon-Hartley law underscores the fundamental role of bandwidth and 
signal-to-noise ratio in communication. It also shows that we can exchange 
increased bandwidth for decreased signal power (Prob. 10.35) for a system with 
given capacity C. 


10.8 Source Coding 


A conversion of the output of a DMS into a sequence of binary symbols (binary 
code word) is called source coding. The device that performs this conversion is 


called the source encoder (Fig. 10-7). 
Source 
encoder Binary 


sequence 


Discrete 
memoryless 
source 


Fig. 10-7 Source coding. 


An objective of source coding is to minimize the average bit rate required for 
representation of the source by reducing the redundancy of the information source. 


A. Code Length and Code Efficiency: 


Let X be a DMS with finite entropy H(X) and an alphabet {x,, ...,x,,} with 
corresponding probabilities of occurrence P(x,)(i = 1, ..., m). Let the binary code 
word assigned to symbol x; by the encoder have length n;, measured in bits. The 


length of a code word is the number of binary digits in the code word. The average 
code word length L, per source symbol, is given by 


= S P(x; )n; (10.73) 
i=l 


The parameter L represents the average number of bits per source symbol used in 
the source coding process. 
The code efficiency n is defined as 


n= Fnin (10.74) 


where L,,:, 18 the minimum possible value of L. When 7 approaches unity, the code 
is said to be efficient. 
The code redundancy y is defined as 


B. Source Coding Theorem: 


The source coding theorem states that fora DMS_X with entropy H(X), the average 
code word length L per symbol is bounded as (Prob. 10.39) 


L = H(X) (10.76) 


and further, L can be made as close to H(X) as desired for some suitably chosen 
code. 


Thus, with Lin = A(X), the code efficiency can be rewritten as 
A(X 
yaa (10.77) 


C. Classification of Codes: 


Classification of codes is best illustrated by an example. Consider Table 10-1 where 
a source of size 4 has been encoded in binary codes with symbol 0 and 1. 


TABLE 10-1 Binary Codes 


aC COME. 3 CoM 4 Cen + CONF = CONE 6 


fit) 


a 
O11) 0] 


1. Fixed-Length Codes: 


A fixed-length code is one whose code word length is fixed. Code 1 and code 2 of 
Table 10-1 are fixed-length codes with length 2. 


2. Variable-Length Codes: 


A variable-length code is one whose code word length is not fixed. All codes of 
Table 10-1 except codes 1 and 2 are variable-length codes. 


3. Distinct Codes: 


A code is distinct if each code word is distinguishable from other code words. All 
codes of Table 10-1 except code 1 are distinct codes—notice the codes for x, and x3. 


4. Prefix-Free Codes: 


A code in which no code word can be formed by adding code symbols to another 
code word is called a prefix-free code. Thus, in a prefix-free code no code word is a 
prefix of another. Codes 2, 4, and 6 of Table 10-1 are prefix-free codes. 


5. Uniquely Decodable Codes: 


A distinct code is uniquely decodable if the original source sequence can be 
reconstructed perfectly from the encoded binary sequence. Note that code 3 of Table 
10-1 is not a uniquely decodable code. For example, the binary sequence 1001 may 
correspond to the source sequences X7x3X, OF XX 1X 1X>. A sufficient condition to 
ensure that a code is uniquely decodable is that no code word is a prefix of another. 
Thus, the prefix-free codes 2, 4, and 6 are uniquely decodable codes. Note that the 
prefix-free condition is not a necessary condition for unique decodability. For 
example, code 5 of Table 10-1 does not satisfy the prefix-free condition, and yet it is 
uniquely decodable since the bit 0 indicates the beginning of each code word of the 
code. 


6. Instantaneous Codes: 


A uniquely decodable code is called an instantaneous code if the end of any code 
word is recognizable without examining subsequent code symbols. The 
instantaneous codes have the property previously mentioned that no code word is a 
prefix of another code word. For this reason, prefix-free codes are sometimes called 
instantaneous codes. 


7. Optimal Codes: 


A code is said to be optimal if it is instantaneous and has minimum average length L 
for a given source with a given probability assignment for the source symbols. 


D. Kraft Inequality: 
Let X be a DMS with alphabet {x,;} (i= 1, 2, ..., m). Assume that the length of the 
assigned binary code word corresponding to x; is n,. 


A necessary and sufficient condition for the existence of an instantaneous binary 
code is 


K= y om x] (10.78) 


1=| 


which is known as the Kraft inequality. (See Prob. 10.43.) 


Note that the Kraft inequality assures us of the existence of an instantaneously 
decodable code with code word lengths that satisfy the inequality. But it does not 
show us how to obtain these code words, nor does it say that any code that satisfies 
the inequality is automatically uniquely decodable (Prob. 10.38). 


10.9 Entropy Coding 


The design of a variable-length code such that its average code word length 
approaches the entropy of the DMS is often referred to as entropy coding. This 
section presents two examples of entropy coding. 


A. Shannon-Fano Coding: 


An efficient code can be obtained by the following simple procedure, known as 
Shannon-Fano algorithm: 


1. List the source symbols in order of decreasing probability. 

2. Partition the set into two sets that are as close to equiprobable as possible, and 
assign 0 to the upper set and 1 to the lower set. 

3. Continue this process, each time partitioning the sets with as nearly equal 
probabilities as possible until further partitioning is not possible. 


An example of Shannon-Fano encoding is shown in Table 10-2. Note that in 
Shannon-Fano encoding the ambiguity may arise in the choice of approximately 
equiprobable sets. (See Prob. 10.46.) Note also that Shannon-Fano coding results in 
suboptimal code. 


TABLE 10-2 Shannon-Fano Encoding 


a, STHP I ATRB a STP 4 CAF = 


H(X) = 2.36 b/symbol 
L = 2.38 b/symbol 
yn = A(X)/L = 0.99 


B. Huffman Encoding: 


In general, Huffman encoding results in an optimum code. Thus, it is the code that 
has the highest efficiency (Prob. 10.47). The Huffman encoding procedure is as 
follows: 


1. List the source symbols in order of decreasing probability. 

2. Combine the probabilities of the two symbols having the lowest probabilities, 
and reorder the resultant probabilities; this step is called reduction 1. The same 
procedure is repeated until there are two ordered probabilities remaining. 

3. Start encoding with the last reduction, which consists of exactly two ordered 
probabilities. Assign 0 as the first digit in the code words for all the source 
symbols associated with the first probability; assign 1 to the second 
probability. 

4. Now go back and assign 0 and | to the second digit for the two probabilities 
that were combined in the previous reduction step, retaining all assignments 
made in Step 3. 

5. Keep regressing this way until the first column is reached. 


An example of Huffman encoding is shown in Table 10-3. 


A(X) = 2.36 b/symbol 
L = 2.38 b/symbol 
n = 0.99 


TABLE 10-3 Huffman Encoding 


Pix) CODE 


tl 
0.20 ——— 0.20 


Note that the Huffman code is not unique depending on Huffman tree and the 
labeling (see Prob. 10.33). 


SOLVED PROBLEMS 


Measure of Information 


10.1. Consider event F occurred when a random experiment is performed with 
probability p. Let /(p) be the information content (or surprise measure) of 
event FE, and assume that it satisfies the following axioms: 

1. Kp)=0 


Kl) =0 


NW 


3. Kp) > Kg) if p<@q 


{(p) is a continuous function of p. 


np 


Kp q) = Kp) + K@ <p=1,0<qg=1 
Then show that /(p) can be aie as 


Kp) = — C log, p (10,79) 


where C is an arbitrary positive integer. 
From Axiom 5, we have 
Ip?) = (pp) = Kp) + Kp) = 2 Kp) 


and by induction, we have 


Ip”) — mip) (10.80) 
Also, for any integer n, we have 
Kp) = 1(pi" ++ pli") =n pt") 
and 


{pl ere (10.81) 
n 
Thus, from Eqs. (10.80) and (10.81), we have 


i(p™") =" 1(p) 


or 
[(p') =r l(p) 


where r is any positive rational number. Then by Axiom 4 


K(p*)— ak p) (10.82) 
where a is any nonnegative number. 
Let a =— log, p (0<p <1). Then p = (1/ 2)% and from Eq. (10.82), we have 
Kp) = AC2)*) =e410/2) = 12) log p = “Clog, p 


where C = / (1/ 2) > (1) = 0 by Axioms 2 and 3. Setting C= 1, we have I(p) 
=—log, p bits. 


10.2. Verify Eq. (10.5); that is, 
M(X;X)) = K(x,) + M(x;) if x, and x; are independent 
Ifx,; and x; are independent, then by Eq. (3.22) 
PCxx,) = P(x) Pla) 


By Eq. (10.1) 


1(x;X;) =log — = log ———_—- 
YO” P(xxj) —~” Pa)P(x;) 


| 
log —— 


= I(x;)+ I(x;) 


= se fe 
P(x;) 


10.3. A DMS Xhas four symbols x, x5, x3, x4 with probabilities P(x,) = 0.4, P(x>) 
= 0.3, P(x3) = 0.2, P(x4) = 0.1. 
(a) Calculate H(X). 
(b) Find the amount of information contained in the messages x)x>x)x3 and 
X4X3X3X7, and compare with the H(X) obtained in part (a). 


4 
H(X)=— ¥ pG;)log,[PQ;)] 
(a) i=! 
=—04log, 0.4 —-0.3 log, 0.3/ — 0.2 log, 0.2 — 0.1 log, 0.1 
= 1.85 b/symbol 
P(x, x,x,x,) = (0.4)(0.3)(0.4)(0.2) = 0.0096 
[(x,x,x,x,) = —log, 0.0096 = 6.70 b/symbol 
Thus, 


(b) 


[(&, XX, X43) < 7.4 [=4H(X)] b/symbol 
P(x,x,X,X,) = (0.1)(0.2 (0.3) = 0.0012 
I(x,x,,x,) = —log, 0.0012 =9.70b/symbol 


Thus, 


I(X,%,%,X,) > 7.4 [= 4H(X)] b/symbol 


10.4. Consider a binary memoryless source X with two symbols x, and x5. Show 
that H(X) is maximum when both x, and x, are equiprobable. 


Let P(x) =a. P(x») =1-a. 


H(X)—-a log, a —(1—@)log,(1— a) (10.83) 
1H(X I 
wack! a a —(1—a@)logs0—a)] 
da da 


Using the relation 


Lae y=+io ed 
de et ye ae 
we obtain 
dH (X) _ — log, a + log, (1— @)= log, ne 
da 7 ; 'o 


The maximum value of H(X) requires that 


dH 
(X) _ 5 
da 
that is, 
1- 1 
a 2 


Note that H(X) = 0 when a= 0 or 1. When P(x,) = P(x,) = | ig maximum and 


2 
is given by 


| | > 4) 
H(X)= = lag, 2+ > lag,2=1  h/symbol (10.84) 


=_ - 


10.5. Verify Eq. (10.9); that is, 


0 = A(X) S log,m 
where m is the size of the alphabet of X. 


Proof of the lower bound: Since 0 < P(x,) < 1, 


=) and (bay 6 
P(x;) PU:) 


Then it follows that 


Thus, 


oe ] 
A(X)— ) P(x; log, 
2,Pesdloea 5, 
Next, we note that 
P(x;)lo eee 0 
i 0) P(x;) 


if and only if P(x;) = 0 or 1. Since 


SP) =1 
i=l 


(10.85) 


when P(x;) = 1, then P(x;) = 0 for 7 # i. Thus, only in this case, H(X) = 0. 


Proof of the upper bound: Consider two probability distributions {P(x;) = 


P;} and {Q(x;) = Q,} on the alphabet {x;},7= 1, 2, ..., m, such that 


mt 


Spa and SO; =1 
i=] 


i=l 


Using Eq. (10.6), we have 


(10.86) 


m 0. 1 m O; 
Plog, = =—~ ) P|In— 
2 i 1089 P mee ee 


1 


Next, using the inequality 


nesia 1 a= 0) 
and noting that the equality holds only if a = 1, we get 


Sem eSe(B- all 3@.-F) 


So -SA-0 


by using Eq. (10.86). Thus, 


where the equality holds only if O; = P; for all 7. Setting 
0.=— i=1,2,....m 


we obtain 


m 


1 
> log, — Pin a a log, P. - 


Hence, 


7 


— H(X)—log, my P, 


= H(X)—log,m=0 
H(X)= log, m 


(10.87) 


(10.88) 


(10.89) 


(10.90) 


tu 


» P.log,m 


{=1 


(10,91) 


and the equality holds only if the symbols in_X are equiprobable, as in Eq. 


(10.90). 


10.6. 


10.7. 


Find the discrete probability distribution of X, py (x;), which maximizes 
information entropy H(X). 


From Eqs. (10.7) and (2.17), we have 


H(X)=— ¥ pxtx;)ln py) (10,92) 
a | 


LiL 


S px(x) =! (10.93) 


a | 


Thus, the problem is the maximization of Eq. (10.92) with constraint Eq. 
(10.93). Then using the method of Lagrange multipliers, we set Lagrangian J 
as 


n \ 


“i hans | 
= S} px(x;) in pets) | | Py(3;) 1 (10,94) 
i=] |  ] 


f= 


where J is the Lagrangian multiplier. Taking the derivative of J with respect 
to py{x;) and setting equal to zero, we obtain 


oJ 
sy = np (x) -1+4=0 
0 Py (x;) 
and 
InpyixJ=A lpyxj=e*! (10,95) 


This shows that all p, (x;) are equal (because they depend on J only) and 
using the constraint Eq. (10.93), we obtain p,(x;) = 1/m. Hence, the uniform 
distribution is the distribution with the maximum entropy. (cf. Prob. 10.5.) 


A high-resolution black-and-white TV picture consists of about 2 x 10° 
picture elements and 16 different brightness levels. Pictures are repeated at 
the rate of 32 per second. All picture elements are assumed to be 
independent, and all levels have equal likelihood of occurrence. Calculate 
the average rate of information conveyed by this TV picture source. 


16 


l I 
T= Depa 4 b/element 


r = 2(10°)(32) = 64(10°) elements /s 
Hence, by Eq. (10.10) 
R = rH(X) = 64(10°)(4) = 256(10°) b/s = 256 Mb/s 
10.8. Consider a telegraph source having two symbols, dot and dash. The dot 
duration is 0.2 s. The dash duration is 3 times the dot duration. The 


probability of the dot’s occurring is twice that of the dash, and the time 
between symbols is 0.2 s. Calculate the information rate of the telegraph 


source. 
P(dot) = 2P(dash) 
P(dot) + P(dash) = 3P(dash) = 1 
Thus, 
1 2 
P(dash) = 3 and P(dot) = 3 
By Eq. (10.7) 


H(X) = —P(dot)log, P(dot) — P(dash) log, P(dash) 
= 0.667(0.585) + 0.333(1.585) = 0.92 b/symbol 
t,, =0.25s Fash = 0-68 t =025 


dot space 
Thus, the average time per symbol is 


1, = P(dot)t,,, + P(dash)ty.4, + = 1.5333 s/symbol 


space 


and the average symbol rate is 


| 
r= — =1.875  symbols/s 
Ts 


Thus, the average information rate of the telegraph source is 
R = rH(X) = 1.875(0.92) = 1.725 b/s 
Discrete Memoryless Channels 


10.9. Consider a binary channel shown in Fig. 10-8. 


0.8 


Fig. 10-8 


(a) Find the channel matrix of the channel. 


(b) Find P(y,) and P(yz) when P(x) = P(xz) = 0.5. (c) Find the joint 
probabilities P(x;, y>) and P(x>, y,) when P(x,) = P(x,) = 0.5. 


(a) Using Eq. (10.11), we see the channel matrix is given by 
x)] [09 Ol 
X>) as a 


(6) Using Eqs. (10.13), (10.14), and (10.15), we obtain 


P|) P(y3 
PCY, 


[P(Y| X)]= 


Xy) P(¥5 


[PY] =[P(X)] [PW | X)] 

0.9 0.1 
0.2 5 
=(0.55 0.45]=[P(,)P(,)] 


=(05 O35] | 


Hence, P(v,) = 0.55 and P(y) = 0.45. 
(c) Using Eqs. (10.16) and (10.17), we obtain 


[P(X. Y)]=[P(X)], [PY | X)] 
fos 07709 0.1 
af os 2 " 
045 0.05 P(X,.V) P(x.) 
re ra ees a 


Hence, P(x1, ¥2) = 0.05 and P(x>, y,;) = 0.1. 


10.10. Two binary channels of Prob. 10.9 are connected in a cascade, as shown in 
Fig. 10-9. 


Fig. 10-9 


(a) Find the overall channel matrix of the resultant channel, and draw the 
resultant equivalent channel diagram. 


(b) Find P(z,) and P(z,) when P(x,) = P(x) = 0.5. 
(a) By Eq. (10.15) 


[P(Y)] =[P(X)] [P| X)] 

[P(Z)] =[P(Y)I[P(Z|¥)] 
=[P(X)][P(Y| XY)ILP(Z|Y)] 
=[P(X)] [P(Z] X)] 


Thus, from Fig. 10-9 


[P(Z|X)] =[PY’| X)][P(Z| ¥)] 
09 0.17/09 0.17 [0.83 0.17 
= hn nr li hee weed 


The resultant equivalent channel diagram is shown in Fig. 10-10. 


e 0.66 : 
Fig. 10-10 
(d) 
[P(Z)]=[PCX)|[P(Z| X)] 
0.83 0.17 
=|05 05] =|0.585 0.415] 
ines son 


Hence, P(z,) = 0.585 and P(z,) = 0.415. 


10.11. A channel has the following channel matrix: 


1 0 
X)|= | 9 ad | (10.96) 


[PUY 
p Ai-p 


(a) Draw the channel diagram. 


(b) If the source has equally likely outputs, compute the probabilities 
associated with the channel outputs for p = 0.2. 


(a) The channel diagram is shown in Fig. 10-11. Note that the channel 
represented by Eq. (10.96) (see Fig. 10-11) is known as the binary 
erasure channel. The binary erasure channel has two inputs x, = 0 and x, 
= | and three outputs y, = 0, y. = e, and y3 = 1, where e indicates an 
erasure; that is, the output is in doubt, and it should be erased. 


x, =0 y,=0 
p 

Yo =€ 
p 

X,=1 Y3=1 
iP 


Fig. 10-11 Binary erasure channel. 


(b) By Eq. (10.15) 


pyyi=fos os Oo?” 
[PY] =[0. Sg 02 is 
=[04 02 0.4] 


Thus, P(v;) = 0.4, P(yy) = 0.2, and P(y3) = 0.4. 


Mutual Information 


10.12. For a lossless channel show that 


H(X|Y) =0 (10.97) 


When we observe the output y; in a lossless channel (Fig. 10-2), it is clear 
which x; was transmitted; that is, 


P(x;|¥;) = 0 or 1 (10.98) 
Now by Eq. (10.23) 


Hix 


Y ) on bs > Per ¥; ) log, P(x,| ¥; ) 
HF (10.99) 


—- y Ply, ») P( v:|¥; Hog, P(x] ¥;) 


j=) i=] 


10.13. 


Note that all the terms in the inner summation are zero because they are in 
the form of | x log, 1 or 0 x log, 0. Hence, we conclude that for a lossless 


channel 


H(X|Y) =0 


Consider a noiseless channel with m input symbols and m output symbols 


(Fig. 10-4). Show that 
A(X) = HY) 


and 
AY |X) =0 


For a noiseless channel the transition probabilities are 


|] i=j 
Py|x) =] ; 
|0 Peg 
Hence, 
| = [P(x;) #=F 
PX; ¥5) =P, | de aa ! 0 iar 
A * : + 


and 


m 


P( yj Vs Bs P(x; , yj )}= P(x; ) 
i=] 
Thus, by Eqs. (10.7) and (10.104) 
H(Y)=— ¥ P(y,)logaP (y;) 
j=! 


— ys P(x;) log, P(x;) i H(X) 
i=l 


Next, by Eqs. (10.24), (10.102), and (10.103) 


(10.100) 


(10.101) 


(10.102) 


(10.103) 


(10.104) 


mom 


HY|X)=— YY PG;,y, logy PO; | x) 
j=li=l 


m m 


as s P(x) 5) log, P(y;|x;) 
fet j= 


=- ¥' P(x;)log, 1=6 


i=! 


10.14. Verify Eq. (10.26); that 1s, 
H(X, Y) = H(X|Y) + H(Y) 
From Eqs. (3.34) and (3.37) 


and 
SP, 9) = PO) 
i=1 


So by Eq. (10.25) and using Eqs. (10.22) and (10.23), we have 


H(X.Y)=- 5, S P(x;,y;) log P(x;, ¥;) 
ja i= 


n 


=-5 YP.) log | PCy, Yi JPCY;, | 


j=li= 


am ie i> ¥;) log P(x; | y; 


i) 


m 


yo? ae yj log P(y; ) 


= H(X|Y)- = P(y,) log P(y,) 
j=l 


= H(X|Y)+ H(Y) 


10.15. Verify Eq. (10.40); that is, 
D(p // q) 2 0 
By definition of D(p // q), Eq. (10.39), we have 


«% 
Gx; )} 
* plx;) | 


Dip i! ‘a Lees, }log, PO) 2 [ig (10.103) 
g(x 


Since minus the logarithm is convex, then by Jensen’s inequality (Eq. 4.40), 
we obtain 


Depii)= E{ 10 . 1e)) i | en) 
: p(3;) 


Now 


log, # aa =— log, [Dew US) |- — log, pyc = — lox, 1=0 
» PLAS) | ae PUX:) Lor } 


Thus, D(p // q) = 0. Next, when p(x;) = g(x;), then log, (¢ / p) = log, 1 = 0, 
and we have D(p // q) = 0. On the other hand, if D(p // q) = 0, then log,(q / 
Pp) = 9, which implies that q / p = 1; that is, p(x;) = g(x 7). Thus, we have 
shown that D(p // q) = 0 and equality holds iff p(x;) = q(x;). 


10.16. Verify Eq. (10.41); that is, 
I(X;Y) = D(Dyy (X, Y)// Py Opy(Y) 


From Eqs. (10.31) and using Eqs. (10.21) and (10.23), we obtain 


1(X, ¥)=H(X}— (XY) 
- ~ 2Pxtt oes PUY)+ LLPr( x;,¥; }log, Pxly (», |») 


~~SB ra (>; 
~SS.por( sags 2 


Py(%;) 


}ilog, satis + YD Par (3;. ¥, j)log, Pyy (x; \y;) 


j 


i) 
~2 2h ( x;+¥;)log, pee 0) = D( pyy (x. ¥)!/ py (*) Py (y)) 


10.17. Verify Eq. (10.32); that is, 
[(X; Y) = I(Y; X) 


Since Pyy(x, ¥) = Pyxy, x), Px (*) Py (”) = Py(Y) Px), by Eq. (10.41), we 
have 


(X,Y) = DPyy, yi Py(X) py(y)) 
= D(Dyy(¥. x) 1 py (y) Py) = HY; X) 


10.18. Verify Eq. (10.33); that is 
(x; Y) 20 
From Eqs. (10.41) and (10.40), we have 
UX; Y) = D(Pyy(X, y) 1! py) py(y)) = 0 
10.19. Using Eq. (10.42) verify Eq. (10.35); that is 
I: ¥) = AY + A) — AA, Y) 


From Eq. (10.42) we have 


1X: Y) = Di pey (xy) py (x) pry) 


= LLP x.y)log, Per) Puy (*5¥) 


Px (x) py (¥) 
~ RDP x.y)(log, Pyy(x.¥) log, py (a) logy py (y)} 
—— A(X, 1) deals v}log, py (*)~ Par (s v}log, Py (¥) 
=— T1(X, Sie Pals) ( 2 Pa (x, N} Der Py (v1 Dra (x, 4) 
_— HOY) Srel) logs Py iT (vy}log2 py (y) 
¥ y 


— ORE ht Be) 
—-ACX)| AY) A(X.) 


10.20. Consider a BSC (Fig. 10-5) with P(x,) = a. 
(a) Show that the mutual information /(X; Y) is given by 


HX, ¥) = ALY) + p log. ee aa al | loz, Pot (10106) 


(6) Calculate /(X; Y) for a = 0.5 and p = 0.1. 
(c) Repeat (b) for a = 0.5 and p = 0.5, and comment on the result. 


Figure 10-12 shows the diagram of the BSC with associated input 
probabilities. 


Fig. 10-12 


(a) Using Eqs. (10.16), (10.17), and (10.20), we have 


le P LP 
a(1—p) ap —fPO.) PO).¥2) 
f =p. i-= - ea »¥) P(x,¥) 


[roxnl=[) . | 4 


Then by Eq. (10.24) 


HOY] X}=—Ply.y logs POY 4) - Py.) log, POy| ee) 


vy] 


—P(X3,¥)) logy P| 921 Pe yy logy PLY, 
— ail pile, pp) aplogsp 

—(l-ojp los, p-U-e@etl— plo. - ph 
——plog, p—O1— pilog.(]—p) (10.107) 


Hence, by Eq. (10.31) 


U(X; Y) = H(Y) — H(Y|X) 
= AY) + plos. pd =p) log. —p) 


(6) When a = 0.5 and p = 0.1, by Eq. (10.15) 
0.9 O.1 
PYY)J=|0.5 OS =10.5 OS 
rOI=[5 93]),; g9)-[05 5] 


Thus, P(y;) = PQ) = 0.5. 
By Eq. (10.22) 


H(Y) = — P(y,) log, P(y,) — PQ) log, POV) 
= 03168, 055-05 168,05 = I 
plog,p+( — p) log, (1 — p) = 90.1 log, 0.1 + 0.9 log, 0.9 
= —0.469 


Thus, 
l(X; Y) = 1 — 0.469 =0.531 


(c) When a = 0.5 and p = 0.5, 


Plog, p + (1 — p) log, (I — p) = 0.5 log, 0.5 + 0.5 log, 0.5 
== ] 


Thus, 
(x; Y)=1-1=0 
Note that in this case (p = 0.5) no information is being transmitted at 
all. An equally acceptable decision could be made by dispensing with the 


channel entirely and “flipping a coin” at the receiver. When /(X; Y) = 0, 
the channel is said to be useless. 


Channel Capacity 
10.21. Verify Eq. (10.46); that is, 
C, = log, m 


where C, is the channel capacity of a lossless channel and m is the number of 
symbols in_X. 


For a lossless channel [Eq. (10.97), Prob. 10.12] 
H(X| Y) =0 


Then by Eq. (10.31) 
KX; ¥) — A(X) — A(X | ¥) — A(X) (10.108) 
Hence, by Eqs. (10.43) and (10.9) 


C, = max I(X;¥)= max H(X)=log,m 
{P(X)} {P(x;)} 


10.22. Verify Eq. (10.52); that is, 


C= la Ps, p+ Tl — piles. — 
where C;, is the channel capacity of a BSC (Fig. 10-5). 


By Eq. (10.106) (Prob. 10.20) the mutual information /(X; Y) of a BSC is 
given by 


K(X; Y) = H(Y) + plog, p + (1 — p) log, (1 — p) 


which is maximum when H(Y) is maximum. Since the channel output is 
binary, H(Y) is maximum when each output has a probability of 0.5 and is 
achieved for equally likely inputs [Eq. (10.9)]. For this case H(Y) = 1, and 
the channel capacity is 


C, = max /(X:Y)=1+ plog, p+(l— p) log, (l— p) 
{P(X)} 


10.23. Find the channel capacity of the binary erasure channel of Fig. 10-13 (Prob. 
10.11). 


Let P(x) =a. Then P(x,) = 1 — a. By Eq. (10.96) 


1—p 
P(x,)=a % Y4 
p 
Yo 
p 
PX )=1-a xX, Ya 
1—p 


Fig. 10-13 


rere |" 9! p 0 lel paren Py, | 2;) P(y3| 4) 


0 p lp P(y,|%>) P(y3|x2)  P(¥3| 2) 


By Eq. (10.15) 


ie 0 
[PY y=[a 1-a]] he : an 


=[a(1—- p) pd-a@)d-p)] 
=[P(y,) PO.) PO3)] 


By Eq. (10.17) 


of 0 Imp PB 0 
[PLY WN= 
ob ie 0 P 1 =p 
_fal—p) ap 0 
| 0 (lL- oR ae 
FPO.) PO. y2)  POy.¥3) 
P(X2,¥,) P(%.¥2) P(%,¥3) 


In addition, from Eqs. (10.22) and (10.24) we can calculate 


HOY)=— ¥ Ply, logy POY) 
il 
at] pjlog. ctl op) ploa. pp ey pplopelf) ahd pl 
—(l opal celow. ce CL eed lew, (1) ced] 
—plogs, p-U- pilog. - p) (10.1095 


aE | A} ) ») Play. v, toes Biy,| 45) 


j-li-l 
=—ill—-pllog.(1—-pi-cp log, p 

—{l-eiploe, p-tl-anl— p) log] - pl 
=—p logy p—(l — pplogst] — pn (W.T20 


Thus, by Eqs. (10.34) and (10.83) 


WX, Y=) - HY | X} 
—(1 pL alogsa ( e@jloy.(l «)] 
=(1— p) IX) (10.111) 


And by Eqs. (10.43) and (10.84) 


C7 max MXP Y— mux (1- pA yxy y—(]— py max AX)-1- p C12) 
{PLR Hla untal} 


Continuous Channel 


10.24. Find the differential entropy H(X) of the uniformly distributed random 
variable X with probability density function 


ts 0SxX=a 


fx()= 4a 


QO otherwise 


for (a) a = 1, (b) a = 2,and (c)a = 


Nl 


By Eq. (10.53) 
H(X)= = fy(rylog, fy (xydx 


al | ; 
= ff —log, —dx = log, a (10,113) 
0 a a ~ 


(aq) a=1, HX) =log,1=0 


®) @=-2,40)1g,2 —1 
(c) a=}, H(X)=log,4=—log,2 =-1 
Note that the differential entropy H(X) is not an absolute measure of 


information, and unlike discrete entropy, differential entropy can be 
negative. 


10.25. Find the probability density function f(x) of X for which differential 
entropy H(X) is maximum. 


Let the support of f(x) be (a, b). [Note that the support of f(x) is the 
region where f(x) > 0.] Now 


i; 
H(X) aed F(pln fy xd dy (10.114) 


h | - 
Jf, fxd = (10.115) 


10.26. 


Using Lagrangian multipliers technique, we let 
be ; biacieninie ey = \ = 7 
j=- J: fy(vdin fy (x) edy A( [: raes) dx I (10,116) 


where J is the Lagrangian multiplier. Taking the functional derivative of J 
with respect to f(x) and setting equal to zero, we obtain 


Y= — In fe(x)—-1-A0=0 (10.117) 
a fy(a) ” _ 
or 
RWS. Keer" (10.118) 


so f(x) is a constant. Then, by constraint Eq. (10.115), we obtain 
] 
fy (QQ) = axx<xb (10,119) 
ba 
Thus, the uniform distribution results in the maximum differential entropy. 
Let X be N (0; 0). Find its differential entropy. 


The differential entropy in nats is expressed as 


U(X) =— i fe (xln fg (0) dx (10.120) 


By Eq. (2.71), the pdf of N(0, 0”) is 


l -x" Hat ) 


ie (10,121) 
V2a07 
Then 
Infy(x)=— i —In J2a0° (10.122) 
20° 


Thus, we obtain 


7 oe A “ x ‘ 4 \ 
UiX)= =i fyi) 3 — Iny 2st | a 
ax ’ 0 ! 


] 


7 
Te i 


= 


ex’) ++ ino") = 1b 4 Lin (2207) 
2 Za 2 


a 
= sm S In (2207) = sin (Qzrecr*) nats‘sample (10.123) 


Changing the base of the logarithm, we have 


] 4 , , ' 
H(Xj= 3 log, (2rec"] bits/sample (10,124) 


10.27. Let (X), ..., X,,) be an n-variate r.v. defined by Eq. (3.92). Find the 
differential entropy of (X},..., X,,). 


From Eq. (3.92) the joint pdf of (X;, ..., X,,) 1s given by 


| | 7 yr ae at * — oe 
IQ) = ae exp | -—el x — ae) Kx a) (10.125) 
x (27)""|K 12 a , ! ‘ § 
where 
My £(X,) 
M=E(X)=| ]=| (10,126) 
H, | [E(X,) 
M1) hy 
K=|' 2 l= [ou]., ev =Cov(X-X,) (10.127) 
(Fey latins Toy 


Then, we obtain 


H(X)=— ff fx(x)In fx (x)dx 


, ] — ni2 pr ili2 
“it J f(s) -5(¢—1)" K(x) (2) IK! | x 


=5E xe = 4,)(K~');j(X; — uy) 


ij 


+ Ae (2z)" |K| 
2 


akg 
2 


SX) — 4) (X; — p;)(K7') 
id 


| [oe 4 In (21)" |K| 
i} 2 


tf 
. Kk 1 n 
= 22K (K Nig +5 Qa) IK 
=2 5 KK"), +51 20)" | 
at 
Qa Wo49 
J 
= i420 (2)" |K! 


- - In(22re)" K nals /sample 


= 


(10,128) 


i y Ww . 
=~ log,(2me)"[K _bits ‘sample (10,129) 


10.28. Find the probability density function f(x) of X with zero mean and variance 
o* for which differential entropy H(X) is maximum. 


The differential entropy in nats is expressed as 


H(X)=— tes flxvln fy (2) dx (10.130) 


The constraints are 


i. f(x) dx =| (10.131) 


[fel xdx=0 (10.132) 
f. fylx) dx = 07? (10.133) 
Using the method of Lagrangian multipliers, and setting Lagrangian as 
T=— 7 fy) In f(x) de + Ay ( ff) ae 1 
id, f fe(x)xdx) ro fo fe rde— 0°) 


and taking the functional derivative of J with respect to f(x) and setting 
equal to zero, we obtain 


i . =— lity) I tAy +A X+A,x? =() 
d fy (xX) ‘ 
In fy(x)=Ay —1 +A x+A, x? 
or 
fla =u eta = Cage (10.134) 


It is obvious from the generic form of the exponential family restricted to the 
second order polynomials that they cover Gaussian distribution only. Thus, 


we obtain M(0; 0”) and 
oF eee I =x? 207 ) Paper See eee ( 3 
Peco é a <x << oe (10.135) 
V2ae 


10.29. Verify Eq. (10.55); that is, 
A(aX) = H (Xx) + log, |a| 
Let Y=a X. Then by Eq. (4.86) 


fO=+f [=| 
a 


la| 


and 


_ fy) logs fy Y) dy 


Fgh) (iae eo 


After a change of variables y / a = x, we have 


H(aXx)=—- |. Ty(x) logy fy (x) dx — log, rail fy(x) ax 
; la| 


al 


=— H(X)+ log, 


10.30. Verify Eq. (10.64); that is, 
D(f // g) =0 


By definition (10.61), we have 


ax 


— DH g)= ffx) log, | £2) 
r,t a 


= log Les fut| | » ax (by Jensen's inequality (4.40)) 


f(x) | 
— log, f. £y(x)dx — log, 1-0 


Thus, D (f // g) => 0. We have equality in Jensen’s inequality which occurs iff 
f=. 


Additive White Gaussian Noise Channel 
10.31. Verify Eq. (10.71); that is, 


C, = max 1(X; 1) = Log, (1+=) 
{fx (as N 


From Eq. (10.68), Y= + Z. Then by Eq. (10.57) we have 


fOGY=HY) AY|X)=A(Y) H(X 1 ZX) 
=TKY)-M(Z X)=Y)- (2) (10.136) 


since Z is independent of X. 
Now, from Eq. (10.124) (Prob. 10.26) and setting o* = N, we have 


Biz we, (2reN) (10.137) 
2 
Since X and Z are independent and E(Z) = 0, we have 
ely*)=£[ix+zp|=elx?)-2e(e@l+elz’)=s49 (10.138) 


Given E(Y*) = S +N, the differential entropy of Y is bounded by 
log, 2a e(S + N), since the Gaussian distribution maximizes the differential 


entropy for a given variance (see Prob. 10.28). Applying this result to bound 
the mutual information, we obtain 


(XX, Y)=H(Y) — A(z) 
< log, 2are(s +N} 5 log, aren 


2ae(S+N) : 
=—log, ae ")) Lop, [1 +5 
2 . Deen }] 2 N | 


Hence, the capacity C, of the AWGN channel is 


| S 
c= I(X;Y) = log, |1+— 
Gro) 2 o,| <| 
10.32. Verify Eq. (10.72); that is, 
7 S - 
C= Blog,|1+ i bits / second 


Assuming that the channel bandwidth is B Hz, then by Nyquist sampling 
theorem we can represent both the input and output by samples taken 1/(2B) 


10.33. 


seconds apart. Each of the input samples is corrupted by the noise to produce 
the corresponding output sample. Since the noise is white and Gaussian, 
each of the noise samples is an 1.1.d. Gaussian r. v.. If the noise has power 
spectral density Np / 2 and bandwidth B, then the noise has power (No / 2) 2B 


= No B and each of the 2BT noise samples in time t has variance N)BT / 
(2BT) = No / 2. Now the capacity of the discrete time Gaussian channel is 
given by (Eq. (10.71)) 


bits/sample 


Let the channel be used over the time interval [0, 7]. Then the power per 
sample is ST / (2 BT) = S/ (2 B), the noise variance per sample is No / 2, and 


hence the capacity per sample is 


arr = hoe: hits ‘sample (1,139) 
Noda J. 22 


Since there are 2B samples each second, the capacity of the channel can be 
rewritten as 


oY 
4 

"4 

} 


Voy 


C = Blogs| 1+ = Blog, bits/second 


+s 
N, 


where N = No B is the total noise power. 
Show that the channel capacity of an ideal AWGN channel with infinite 
bandwidth is given by 


r,t Die aed 


i = 1.44 b/ 9 
In2 y . 10.159) 


where S is the average signal power and 7/2 is the power spectral density of 
white Gaussian noise. 


The noise power N is given by N= 7B. Thus, by Eq. (10.72) 


c= Aros, (1+] 
nB 


Let S/(y7B) = 24. Then 


§ 1 § Intl-A 
C=— log, il eo ee 


= (10.141. 
WA In2 A 


C,, = lim B log, q | 
Boo nB 
St es Indl + A) 
In2n4>0 A 


Since ra [In(1 + A)]/A = 1, we obtain 


G2? waa” b/s 


In2 n n 


Note that Eq. (10.140) can be used to estimate upper limits on the 
performance of any practical communication system whose transmission 
channel can be approximated by the AWGN channel. 


10.34. Consider an AWGN channel with 4-kHz bandwidth and the noise power 
spectral density 7 /2 = 10°! W/ Hz. The signal power required at the 
receiver is 0.1 mW. Calculate the capacity of this channel. 


B= 4000 Hz S$ =0.1007) W 
N =nB =2(10 '7)(4000) = 8(10-°) W 


Thus, 


S 0.1007) 4 
~ = = 1 25(10 
N 8(107?) ail 


And by Eq. (10.72) 


S 
C= B log, : +4) 
= 4000 log, [1 + 1.25(10*)] = 54.44(10°) b/s 


10.35. An analog signal having 4-kHz bandwidth is sampled at 1.25 times the 
Nyquist rate, and each sample is quantized into one of 256 equally likely 
levels. Assume that the successive samples are statistically independent. 
(a) What is the information rate of this source? 


(6) Can the output of this source be transmitted without error over an AWGN 
channel with a bandwidth of 10 kHz and an S'// N ratio of 20 dB? 


(c) Find the S’/ N ratio required for error-free transmission for part (b). 


(d) Find the bandwidth required for an AWGN channel for error-free 
transmission of the output of this source if the S/ N ratio is 20 dB. 


(a) 


Fy = 4(10°)Hz 
Nyquist rate = 2f,, = 8(10%) samples /s 
r= 8(10°)(1.25) = 10+ samples /s 
A(X) = log, 256 = 8 b/sample 


By Eq (10.10) the information rate R of the source is 
R = rH(X) = 104(8) b/s = 80 kb/s 


(b) By Eq. (10.72) 
C=Blog, r +5 = 10* log, (1+ 107) = 66.6(10°) b/s 
IN 


Since R > C, error-free transmission is not possible. 
(c) The required S/N ratio can be found by 


C=10* log, q + > 8(10*) 


or 


or 


1+—= 2° —256 ~*~ =255 (=24.1dB) 


Thus, the required S/N ratio must be greater than or equal to 24.1 dB for 
error-free transmission. 


(d) The required bandwidth B can be found by 
C = Blog, (1+ 100) = 8(10*) 
or 


8(107) 


> — =] 2(10* )Hz = 12 kHz 
log, (1+ 100) 


and the required bandwidth of the channel must be greater than or equal 
to 12 kHz. 


Source Coding 


10.36. Consider a DMS X with two symbols x, and x, and P(x,) = 0.9, P(x>) = 0.1. 
Symbols x, and x, are encoded as follows (Table 10-4): 


TABLE 10-4 


P(X.) CODE 


Find the efficiency 7 and the redundancy y of this code. 


By Eq. (10.73) the average code length LZ per symbol is 


ho 


/ 


L= ) P(x,;)n; =(0.9)1) + (0.1) =1b 
i=l 


By Eq. (10.7) 


H(X)= -y P(x;) logy P(x;) 


i=l 


= —0.9 log, 0.9 — 0.1 log, 0.1 = 0.469 b/symbol 
Thus, by Eq. (10.77) the code efficiency 7 is 


n= a =0).469 = 46.9% 


By Eq. (10.75) the code redundancy y is 
y =1-—n=0.531 =53.1% 


10.37. The second-order extension of the DMS_X of Prob. 10.36, denoted by X”, is 
formed by taking the source symbols two at a time. The coding of this 
extension is shown in Table 10-5. Find the efficiency 7 and the redundancy y 
of this extension code. 


TABLE 10-5 
a, P(a;) CODE 
a, = £4, 0.81 0 
a, = X45 0.09 10 
Ay = KX 0.09 110 
a, = XX, 0.01 111 


4 
L =. P(a;)n; =0.81(1) + 0.09(2) + 0.09(3) + 0.0103) 
i=l 


= 1.29 b/symbol 


The entropy of the second-order extension of X, H(X°), is given by 


4 
H(X°)= ¥ P(a;) log, Pla;) 


‘=1 
=—O.8llog, 0.81-—0.09 log, 0.09 — 0.09 log, 0.09 — 0.01 log, 0.01 
= (1.938 b/symbol 


Therefore, the code efficiency 7 is 


H(X*) 0.93 
gest) B88 _ pao ape 


and the code redundancy y is 
y=1—-—=0.273 = 27.3% 
Note that H(X?) = 2H(X). 


10.38. Consider a DMS X with symbols x,, i= 1, 2, 3, 4. Table 10-6 lists four 


possible binary codes. 
TABLE 10-6 
nS CODE. A CODE R CODE C CODE D 
x 0) 
x, 11 
x | 100 
Xs 110 


(a) Show that all codes except code B satisfy the Kraft inequality. 


(6) Show that codes A and D are uniquely decodable but codes B and C are 
not uniquely decodable. 


(a) From Eq. (10.78) we obtain the following: 


For code A: Ny = Ny = Ny = Ng =2 


For code B: m=1 m=n,=2 nm =3 


For code C: M=1 m=2 m=m=3 
4 
Ka=Sxr ale le lilas 
= 2 4 8 8 
For code D: m=1l =n, ==: 
4 
K=Srastietetatcs 
J 2 8 8 8 8 


All codes except code B satisfy the Kraft inequality. 


(b) Codes A and D are prefix-free codes. They are therefore uniquely 
decodable. Code B does not satisfy the Kraft inequality, and it is not 
uniquely decodable. Although code C does satisfy the Kraft inequality, it 
is not uniquely decodable. This can be seen by the following example: 
Given the binary sequence 0110110. This sequence may correspond to 
the source sequences x, XX; X4 OF X} X4 X4. 


10.39. Verify Eq. (10.76); that is, 
L= H(X) 


where L is the average code word length per symbol and H(X) is the source 
entropy. 


From Eq. (10.89) (Prob. 10.5), we have 
SF logs Gi oan 


where the equality holds only if O; = P;. Let 


oF 10.142 
Q; K ( ) 
where 
k= ya (10.143) 
i=l 
which is defined in Eq. (10.78). Then 
\ | m 
Q.=—Y2 "=! (10.144) 
i=l K i=] 
mM aij m | 
dF log, P = DF ace — logs K) 
and 
=— 3 Plog, 2 — 5 An, — dog, K)SP (10.145) 
i=l i=] i=l 
= H(X)—L—log, K <0 
From the Kraft inequality (10.78) we have 
log, K=0 (10.146) 
Thus, 
A(X) -L=log, K=0 (10.147) 
or 


L=H(X) 
The equality holds when K = 1 and P; = Q,. 


10.40. Let_X be a DMS with symbols x; and corresponding probabilities P(x;) = P;, i 
= 1,2, ...,m. Show that for the optimum source encoding we require that 


10.41. 


K=)52 "=I (10.148) 


and 


n:=log5 — =; (10.149) 


where n; is the length of the code word corresponding to x; and J; is the 
information content of x;. 


From the result of Prob. 10.39, the optimum source encoding with L = 
H(X) requires K = 1 and P; = Q,. Thus, by Eqs. (10.143) and (10.142) 


k=) mi =| (10.150) 


and 


P.=Q,=2-" (10.151) 
Hence, 


n, = —log, F, = log, 5 =i; 


i 


Note that Eq. (10.149) implies the following commonsense principle: 
Symbols that occur with high probability should be assigned shorter code 
words than symbols that occur with low probability. 


Consider a DMS_X with symbols x; and corresponding probabilities P(x;) = 
P,,i=1,2,...,m. Let n; be the length of the code word for x; such that 


] ley = log +41 5. 
O25 =n; =log, 7 (10.152) 


Show that this relationship satisfies the Kraft inequality (10.78), and find the 
bound on K in Eq. (10.78). 


10.42. 


Equation (10.152) can be rewritten as 


lhe PS 1b. Fe (10.153) 
or 
log, = n= log, P.= 1 
Then 
Jlogy Pi Sot: Se Jlog, P72 -1 
or 
P.=2 % =tp. (10.154) 
Thus, 
mm ui | m 
FRc ee Re, ocean © (10.155) 
a ee 
or 
ar | 
—~ ce aa 
l= 52 = (10.156) 


which indicates that the Kraft inequality (10.78) is satisfied, and the bound 
on K is 


me ais (10.157) 


Consider a DMS_X with symbols x; and corresponding probabilities P(x;) = 
P,,i= 1,2, ...,m. Show that a code constructed in agreement with Eq. 
(10.152) will satisfy the following relation: 


WxX)S= Ls W(X) + 1 (10.158) 
where H(X) is the source entropy and L is the average code word length. 


Multiplying Eq. (10.153) by P; and summing over i yields 


aw at nT 
—¥ Plog, 2 = Ya P = 'V' P(—log, +1) (10.159) 
f= i=1 '=1 


Now 


mn in Ui 


(log,  +1)=—) Flog, +) F 
AMC WER AV HA low LB 
= H(X)+1 
Thus, Eq. (10.159) reduces to 
H(X) <L<H(X) +1 


10.43. Verify Kraft inequality Eq. (10.78); that is, 


m 


K= by uh 2] 
i=1 


Consider a binary tree representing the code words; (Fig. 10-14). This tree 
extends downward toward infinity. The path down the tree is the sequence of 
symbols (0, 1), and each leaf of the tree with its unique path corresponds to a 
code word. Since an instantaneous code is a prefix-free code, each code 
word eliminates its descendants as possible code words. 


level 1 
level 2 


level 3 O 


000 O01 O10 O11 100 101 110 #111 


Fig. 10-14 Binary tree. 


Let n 
a level of n 


be the length of the longest code word. Of all the possible nodes at 


max 


max» Some may be code words, some may be descendants of code 


words, and some may be neither. Acode word of level n; has 2”max ~ 


descendants at level n,,,,. The total number of possible leaf nodes at level 
Nmax 1S 2”max. Hence, summing over all code words, we have 
m 


) max TN max 
i=l 


Dividing through by 2”max, we obtain 


K= 2" <| 
i=l 


Conversely, given any set of code words with length n; (i= 1, ..., m) which 


satisfy the inequality, we can always construct a tree. First, order the code 
word lengths according to increasing length, then construct code words in 
terms of the binary tree introduced in Fig. 10-14. 


10.44. Show that every source alphabet X= {x), ..., x,,} has a binary prefix code. 


Given source symbols x), ..., X,,, choose the code length n; such that 2”' > m; 
that is, n; > log, m. Then 


Thus, Kraft’s inequality is satisfied and there is a prefix code. 


Entropy Coding 


10.45. A DMS X has four symbols x), x3, x3, and x4 with P(x,) = a PG5) = p and 
Pix) = Pay = 7 Construct a Shannon-Fano code for_X; show that this 


code has the optimum property that n; = /(x,) and that the code efficiency is 
100 percent. 


The Shannon-Fano code is constructed as follows (see Table 10-7): 


TABLE 10-7 


STEP 1 STEP 2 STEP 3 CODE 


pot 
l ] | a | 111 


=l=n, I{x,)=—log, 


L 
2 
i 
4 
8 


~|— 


I(x,;)=— log, 


=3=n, J(x,)=—log, 


c 


I(x3)=—log, 


=_— 2 Wes 5. 
H(X)= YP MO)= 5) + 72) +33) +3) = 1.75 


=] 


4 
1 1 1 1 
L= YS Pex ya. = —(1) + — (2) +— (3) +—(3)=1.-75 
Pin 5 +72) +23) +50) 
n= =1=100% 


10.46. A DMS X has five equally likely symbols. 
(a) Construct a Shannon-Fano code for_X, and calculate the efficiency of the 
code. 
(6) Construct another Shannon-Fano code and compare the results. 
(c) Repeat for the Huffman code and compare the results. 


(a) AShannon-Fano code [by choosing two approximately equiprobable (0.4 
versus 0.6) sets] is constructed as follows (see Table 10-8): 


TABLE 10-8 


am 
| 
i 
La 


H(X)=- PCy) log, P(x;)=5(-0.2 log, 0.2)=2.32 
- 
L=¥ P(x), =0.2(2+24243+3)=24 
n= H(A) _ 232 _ 9 967 = 96.7% 


L 2.4 


(6) Another Shannon-Fano code [by choosing another two approximately 
equiprobable (0.6 versus 0.4) sets] is constructed as follows (see Table 
10-9): 


TABLE 10-9 


L= SPU, - =0,2(2 +3434242)=24 


=| 


Since the average code word length is the same as that for the code of part 
(a), the efficiency is the same. 
(c) The Huffman code is constructed as follows (see Table 10-10): 


L= y P(x,)n, =0,2(2 +34+34+24+2)=24 
i=| 


Since the average code word length is the same as that for the Shannon-Fano 
code, the efficiency is also the same. 


TABLE 10-10 


Ai) COME 


10.47. A DMS X has five symbols x), x5, x3, x4, and x5 with P(x,) = 0.4, P(x) = 
0.19, P(x3) = 0.16, P(x4) = 0.15, and P(x5) = 0.1. 


(a) Construct a Shannon-Fano code for X, and calculate the efficiency of the 
code. 


(6) Repeat for the Huffman code and compare the results. 
(a) The Shannon-Fano code is constructed as follows (see Table 10-11): 


TABLE 10-11 


mei su 
Mi 


5 
H(X)=— ¥ P(x) logy P(x) 
1=] 


=—().4 log, 0.4—-0.19 log, 0.19 - 0.16 log, 0.16 
—0),15 log, 0.15 -0.1 log, 0.1 


=2.15 


5 
L= ¥ P(xi)n, 
i=| 


= (),4(2) +.0.19(2) +0,16(2) +.0.15(3) +.0.1(3) =2.25 
H(X) 2.15 


= — = — =0.956 = 95 6% 
bo 2a 


(6) The Huffman code is constructed as follows (see Table 10-12): 


L= S Pn 


i=] 
= (0.A(1) +(0.19 +0.16 +0.15 +.0.1)(3) = 2.2 


TABLE 10-12 


My CODE 


— A(X) 2.15 _ 


n= = == =0.977 = 97.7% 
gE @2 


The average code word length of the Huffman code is shorter than that of the 


Shannon-Fano code, and thus the efficiency is higher than that of the 
Shannon-Fano code. 


SUPPLEMENTARY PROBLEMS 


10.48. Consider a source X that produces five symbols with probabilities 
1 | 
—,so-e mem > and —. Determine the source entropy H(X). 
2 4 8 16 16 


10.49. Calculate the average information content in the English language, assuming 
that each of the 26 characters in the alphabet occurs with equal probability. 


10.50. Two BSCs are connected in cascade, as shown in Fig. 10-15. 


iM if 07 


06 ? 


Fig. 10-15 


(a) Find the channel matrix of the resultant channel. 
(b) Find P(z,) and P(z,) if P(x,) = 0.6 and P(x>) = 0.4. 


10.51. Consider the DMC shown in Fig. 10-16. 


Fig. 10-16 


(a) Find the output probabilities 1f 


_ | _ _ | 
F(x) = ; and P(x,) = P(x,) = 7 
(5) Find the output entropy H(Y). 


10.52. Verify Eq. (10.35), that is, 


I(X; Y) = H(X) + H(Y) — HX, Y) 


10.53. 


10.54. 


10.55. 


10.56. 


10.57. 


10.58. 


10.59. 


10.60. 


10.61. 


10.62. 


10.63. 


Show that H(X, Y) < H(X) + H(Y) with equality if and only if X and Y are 
independent. 


Show that for a deterministic channel 
H(Y | X) =0 


Consider a channel with an input X and an output Y. Show that if X and Y are 
statistically independent, then H(X|Y) = H(X) and J(X; Y) = 0. 


Achannel is described by the following channel matrix. 
(a) Draw the channel diagram. 
(b) Find the channel capacity. 


Let X be a random variable with probability density function f(x), and let Y 
= aX + b, where a and b are constants. Find H(Y) in terms of H(X). 
Show that H(X + c) = H(X), where c is a constant. 


Show that H(X) > H(X|Y), and H (Y) > H(X). 


Verify Eq. (10.30), that is, H(X, Y) = H(X) + A(X), if X and Y are 
independent. 


Find the pdf fy (x) of a continuous r.v X with E(X) = « which maximizes the 
differential entropy H(X). 


Calculate the capacity of AWGN channel with a bandwidth of 1 MHz and an 
S/N ratio of 40 dB. 


Consider a DMS_X with m equiprobable symbols x,, i= 1, 2, ..., m. 


(a) Show that the use of a fixed-length code for the representation of x; is 
most efficient. 

(b) Let ng be the fixed code word length. Show that if ny = logym, then the 
code efficiency is 100 percent. 


10.64. Construct a Huffman code for the DMS_X of Prob. 10.45, and show that the 
code is an optimum code. 


10.65. ADMS X has five symbols x1, x, x3, x4, and x5 with respective probabilities 
0.2, 0.15, 0.05, 0.1, and 0.5. 
(a) Construct a Shannon-Fano code for_X, and calculate the code efficiency. 
(6) Repeat (a) for the Huffman code. 


10.66. Show that the Kraft inequality is satisfied by the codes of Prob. 10.46. 


ANSWERS TO SUPPLEMENTARY PROBLEMS 


10.48. 1.875 b/ symbol 


10.49. 4.7 b / character 


0.62 0.38 
(a) 
10.50. 0.38 0.62 


(b) P(z,)=0.524, P(z,)=0476 


10.51. (a) P(y)) = 7/24, P(v2) = 17/48, and P(y3) = 17/48 
(6) 1.58 b / symbol 


10.52. Hint: Use Eqs. (10.31) and (10.26). 


10.53. Hint: Use Eqs. (10.33) and (10.35). 


10.54. Hint: Use Eq. (10.24), and note that for a deterministic channel P(y)| x;) are 
either 0 or 1. 


10.55. Hint: Use Eqs. (3.32) and (3.37) in Eqs. (10.23) and (10.31). 


10.56. (a) See Fig. 10-17. 
(6) 1 b/symbol 


ra) 


yy 


¥o 


ho] 


Xy e——_—_——_»>———* I, 
1 


Fig. 10-17 
10.57. H(Y) = H(X) + log, a 
10.58. Hint: Let Y=X +c and follow Prob. 10.29. 
10.59. Hint: Use Eqs. (10.31), (10.33), and (10.34). 


10.60. Hint: Use Eqs. (10.28), (10.29), and (10.26). 


l x 
61. fy(x) =—e ve x=0 
u 


10.62. 13.29 Mb/s 


10.63. Hint: Use Eqs. (10.73) and (10.76). 


10.64. 
Symbols: ce a , 
Code: () 10 110 
10.65. 
(a) Symbols: — x, X, ty x, 


Code: | () HO | oTTTT TAO 
Code efficiency = 98.6 percent. 


(b) Symbols: x, 1s i ij 


Code: I] 100 OTT 1010 
Code efficiency 7 = 98.6 percent. 


10.66. Hint: Use Eq. (10.78) 


APPENDIX A 


Normal Distribution 


Fig. A 


TABLE A Normal Distribution ®(z) 
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The material below refers to Fig. A. 
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APPENDIX B 


Fourier Transform 


B.1 Continuous-Time Fourier Transform 


Definition: 
— jt) | i 
X(o)= | We dé = ox=—f Xe do 


Joo: Dh Ja 


TABLE B-1 Properties of the Continuous-Time Fourier Transform 
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TABLE B-2 Common Continuous-Time Fourier Transform Pairs 
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B.2 Discrete-Time Fourier Transform 


Definition: 
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TABLE B-4 Common Discrete-Time Fourier Transform Pairs 
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Please note that index links point to page beginnings from the print edition. 
Locations are approximate in e-readers, and you may need to page down 
one or more times after clicking a link to get to the indexed material. 


a priori probability, 333 
a posteriori probability, 333 
Absorbing barrier, 241 

states, 215 
Absorption, 215 

probability, 215 
Acceptance region, 331 
Accessible states, 214 
Additive white gaussian noise channel, 397 
Algebra of sets, 2—6, 15 
Alternative hypothesis, 331 
Aperiodic states, 215 
Arrival (or birth) parameter, 350 
Arrival process, 216, 253 
Assemble average, 208 
Autocorrelation function, 208, 273 
Autocovariance function, 209 
Average information, 368 
Axioms of probability, 8 


Band-limited, channel, 376 
white noise, 308 
Bayes’, estimate, 314 


estimator, 314 
estimation, 314, 321 
risk, 334 
rule, 10 
test, 334 
theorem, 10 
Bernoulli, distribution, 55 
experiment, 44 
process, 222 
r.v., 55 
trials, 44, 56 
Best estimator, 314 
Biased estimator, 316 
Binary, communication channel, 117—118 
erasure channel, 386, 392 
symmetrical channel (BSC), 371 
Binomial, distribution, 56 
coefficient, 56 
r.v., 56 
Birth-death process, 350 
Birth parameter, 350 
Bivariate, normal distribution, 111 
rv., 101, 122 
Bonferroni’s inequality, 23 
Boole’s inequality, 24 
Brownian motion process (see Wiener process) 
Buffon’s needle, 128 


Cardinality, 4 
Cauchy, criterion, 284 
r.v., 98 
Cauchy-Schwarz inequality, 133, 154, 186 
Central limit theorem, 64, 158, 198-199 


Chain, 208 

Markov, 210 
Channel, band-limited, 376 

binary symmetrical, 371 

capacity, 391, 393 

continuous, 374, 393 

deterministic, 370 

discrete memoryless (DMC), 369 

lossless, 370 

matrix, 238, 369 

noiseless, 371 

representation, 369 

transition probability, 369 
Chapman-Kolomogorov equation, 212 
Characteristic function, 156-157, 196 
Chebyshev inequality, 86 
Chi-square (y’) rv., 181 
Code, classification, 377 

distinct, 378 

efficiency, 377 

Huffman, 380 

instantaneous, 378 

length, 377 

optimal, 378 

prefix-free, 378 

redundancy, 377 

Shannon-Fano, 405 

uniquely decidable, 378 
Coding, entropy, 378, 405 

source, 400 
Complement of set, 2 
Complex random process, 208, 280 
Composite hypothesis, 331 


Concave function, 153 
Conditional, distribution, 64, 92, 105, 128 
expectation, 107, 219 
mean, 107, 135 
probability, 10, 31 
probability density function (pdf), 105 
probability mass function (pmf), 105 
variance, 107, 135 
Confidence, coefficient, 324 
interval, 324 
Consistent estimator, 313 
Continuity theorem of probability, 26 
Continuity correction, 201 
Convex function, 153 
Convolution, 168, 276 
integral, 276 
sum, 277 
Correlation, 107 
coefficient, 106-107, 131, 208 
Counting process, 217 
Poisson, 217 
Covariance, 106-107, 131, 208 
matrix, 111 
stationary, 230 
Craps, 45 
Critical region, 331 
Cross-correlation function, 273, 
Cross power spectral density (or spectrum), 274 
Cumulative distribution function (cdf), 52 


Death parameter, 351 
Decision test, 332, 338 
Bayes’, 334 


likelihood ratio, 333 
MAP (maximum a posteriori), 333 
maximum-likelihood, 332 
minimax (min-max), 335 
minimum probablity of error, 334 
Neyman-Pearson, 333 
Decision theory, 331 
De Morgan’s laws, 6, 18 
Departure (or death) parameter, 351 
Detection probability, 332 
Differential entropy, 394 
Dirac 6 function, 275 
Discrete, memoryless channel (DMC), 369 
memoryless source (DMS), 369 
rv., 71, 116 
Discrete-parameter Markov chain, 235 
Disjoint sets, 3 
Distribution: 
Bernoulli, 55 
binomial, 56 
conditional,64, 92, 105, 128 
continuous uniform, 61 
discrete uniform, 60 
exponential, 61 
first-order, 208 
gamma, 62 
geometric, 57 
limiting, 216 
multinomial, 110 
negative binomial, 58 
normal (or gaussian), 63 
nth-order, 208 
Poisson, 59 


second-order, 208 
stationary, 216 
uniform, continuous, 61 
discrete, 60 
Distribution function, 52, 66 
cumulative (cdf), 52 
Domain, 50, 101 
Doob decomposition, 221 


Efficient estimator, 313 
Eigenvalue, 216 
Eigenvector, 216 
Ensemble, 207 

average, 208 
Entropy, 368 

coding, 378, 405 

conditional, 371 

differential, 374 

joint, 371 

relative, 373, 375 
Equally likely events, 9, 27 
Equivocation, 372 
Ergodic, in the mean, 307 

process, 211 
Erlang’s, delay (or C) formula, 360 

loss (or B) formula, 364 
Estimates, Bayes’, 314 

point, 312 

interval, 312 

maximum likelyhood, 313 
Estimation, 312 

Bayes’, 314 

error, 314 


mean square, 314 
maximum likelihood, 313, 319 
mean square, 314, 325 

linear, 315 
parameter, 312 
theory, 312 

Estimator, Bayes’ 314 
best, 314 
biased, 316 
consistent, 313 
efficient, 313 

most, 313 
maximum-likelihood, 313 
minimum mean square error, 314 
minimum variance,3 13 
point, 312, 316 
unbiased, 312 

Event, 2, 12, 51 
certain, 2 
elementary, 2 
equally likely, 9, 27 
impossible, 4 
independent, 11, 41 
mutually exclusive, 8, 9 

and exhaustive, 10 

space (or o-field), 6 
Expectation, 54, 152-153, 182 
conditional, 107, 153, 219 
properties of, 220 
Expected value (see Mean) 
Experiment, Bernoulli, 44 
random, | 
Exponential, distribution, 61 


rv., 61 


Factorial moment, 155 
False-alarm probability, 332 
Filtration, 219, 222 
F’,-measurable, 219 


Fourier series, 279, 300 
Perseval’s theorem for, 301 
Fourier transform, 280, 304 
Functions of r.v.’s, 149-152, 159, 167, 178 
Fundamental matrix, 215 


Gambler’s ruin, 241 
Gamma, distribution, 62 
function, 62, 79 
r.v.,62, 180 
Gaussian distribution (see Normal distribution) 
Geometric, distribution, 57 
memoyless property, 58, 74 
r.v.,57 


Huffman, code, 380 
Encoding, 379 
Hypergeometric r.v., 97 
Hypothesis, alternative, 331 
composite, 331 
null, 331 
simple, 331 
Hypothesis testing, 331, 335 
level of significance, 332 
power of, 332 


Impulse response, 276 
Independence law, 220 


Independent (statistically), events, 11 
increments, 210 
process, 210 
r.v.’s, 102, 105 
Information, content, 368 
measure of, 367 
mutual, 371-372 
rate, 368 
source, 367 
theory, 367 
Initial-state probability vector, 213 
Interarrival process, 216 
Intersection of sets, 3 
Interval estimate, 312 


Jacobian, 151 

Jenson’s inequlity, 153, 186, 220 

Joint, characteristic function, 157 
distribution function,102, 112 
moment generating function, 156 
probability density function (pdf), 104, 122 
probability mass function (pmf), 103, 116 
probability matrix, 370 


Karhunen-Loéve expansion, 280, 300 
Kraft inequality, 378, 404 
Kullback-Leibler divergence, 373 


Lagrange multiplier, 334, 384, 394, 396 
Laplace r.v., 98 
Law of large numbers, 158, 198 
Level of significance, 332 
Likelihood, function, 313 
ratio, 333 


test, 333 
Limiting distribution, 216 
Linearity, 153, 220 
Linear mean-square estimation, 314, 327 
Linear system, 276, 294 
continuous-time, 276 
impulse response of, 276 
response to random inputs, 277, 294 
discrete-time, 276 
impulse (or unit sample) response, 277 
response to random inputs, 278, 294 
Little’s formula, 350 
Log-normal r.v., 165 


MAP (maximum a posteriori) test, 333 
Marginal, distribution function, 103 
cumulative distribution function (cdf), 103 
probability density function (pdf), 104 
probability mass function (pmf), 103 
Markov, chains, 210 
discrete-parameter, 211, 235 
fundamental matrix, 215 
homogeneous, 212 
irreducible, 214 
nonhomogeneous, 212 
regular, 216 
inequality, 86 
matrix, 212 
process, 210 
property, 211 
Maximum likelihood estimator, 313, 319 
Mean, 54, 86, 208 
conditional, 107 


Mean square, continuity, 271 
derivative, 272 
error, 314 
minimum, 314 
estimation, 314, 325 
linear, 315 
integral, 272 
Median, 97 
Memoryless property (or Markov property), 58, 62, 74, 94, 211 
Mercer’s theorem, 280 
Measurability, 220 
Measure of information, 380 
Minimax (min-max) test, 335 
Minimum probability of error test, 334 
Minimum variance estimator, 313 
Mixed r.v., 54 
Mode, 97 
Moment, 55, 155 
Moment generating function, 155, 191 
joint, 156 
Most efficient estimator, 313 
Multinomial, coefficient, 140 
distribution, 110 
r.v., 110 
theorem, 140 
trial, 110 
Multiple r.v., 101 
Mutual information, 367, 371, 387 
Mutually exclusive, events, 3, 9 
and exhaustive events, 10 
sets, 3 


Negative binomial r.v., 58 


Neyman-Pearson test, 332, 340 
Nonstationary process, 209 
Normal, distribution, 63, 411 
bivariate, 111 
n-variate, 111 
process, 211, 234 
r.v., 63 
standard, 64 
Null, event (set), 3 
hypothesis, 331 
recurrent state, 214 


Optimal stopping theorem, 221, 265 
Orthogonal, processes, 273 

r.v., 107 
Orthogonality principle, 315 
Outcomes, | 


Parameter estimation, 312 
Parameter set, 208 
Parseval’s theorem, 301 
Periodic states, 215 
Point estimators, 312 
Point of occurrence, 216 
Poisson, distribution, 59 

process, 216, 249 

r.v., 59 

white noise, 293 
Polya’s urn, 263 
Positive recurrent states, 214 
Positivity, 220 
Posterior probability, 333 
Power, function, 336 


of test, 332 
Power spectral density (or spectrum), 273, 288 
cross, 274 
Prior probability, 333 
Probability, 1 
conditional, 10, 31 
continuity theorem of, 26 
density function (pdf), 54 
distribution, 213 
generating function, 154, 187 
initial state, 213 
mass function (pmf), 53 
measure, 7—8 
space, 6—7, 21 
total, 10, 38 
Projection law, 220 


Queueing, system, 349 
M/M/1, 352 
M/M/1/K, 353 
M/M/s, 352 
M/M/s/K, 354 
theory, 349 


Random, experiment, | 
process, 207 
complex, 208 
Fourier transform of, 280 
independent, 210 
real, 208 
realization of, 207 
sample function of, 207 
sample, 199, 312 


sequence, 208 
telegraph signal, 291 
semi, 290 
variable (r.v.), 50 
continuous, 54 
discrete, 53 
function of, 149 
mixed, 54 
uncorrelated, 107 
vector, 101, 108, 111, 137 
walk, 222 
simple, 222 
Range, 50, 101 
Rayleigh r.v., 78, 178 
Real random process, 208 
Recurrent process, 216 
Recurrent states, 214 
null, 214 
positive, 214 
Regression line, 315 
Rejection region, 331 
Relative frequency, 8 
Renewal process, 216 


Sample, function, 207 

mean, 158, 199, 316 

point, | 

random, 312 

space, 1, 12 

variance, 329 

vector (see Random sample) 
Sequence, decreasing, 25 

increasing, 25 


Sets, | 
algebra of, 2—6, 15 
cardinality of, 4 
countable, 2 
difference of, 3 
symmetrical, 3 
disjoint, 3 
intersection of, 3 
mutually exclusive, 3 
product of, 4 
size of, 4 
union of, 3 
Shannon-Fano coding, 379 
Shannon-Hartley law, 376 
Sigma field (see event space) 
Signal-to-noise (S/N) ratio, 370 
Simple, hypothesis, 331 
random walk, 222 
Source, alphabet, 367 
coding, 376, 400 
theorem, 377 
encoder, 376 
Stability, 220 
Standard, deviation, 55 
normal r.v., 63 
State probability vector, 213 
State space, 208 
States, absorbing, 215 
accessible, 214 
aperiodic, 215 
periodic, 215 
recurrent, 214 
null, 214 


positive, 214 
transient, 214 
Stationary, distributions, 216 
independent increments, 210 
processes, 209 
strict sense, 209 
weak, 210 
wide sense (WSS), 210 
transition probability, 212 
Statistic, 312 
sufficient, 345 
Statistical hypothesis, 331 
Stochastic, continuity, 271 
derivative, 272 
integral, 272 
matrix (see Markov matrix) 
periodicity, 279 
process (see Random process) 
Stopping time, 221 
optimal, 221 
System, linear, 276 
linear time invariance (LTI), 276 
response to random inputs, 276, 294 
parallel, 43 
series, 42 


Threshold value, 333 
Time-average, 211 

Time autocorrelation function, 211 
Total probability, 10, 38 

Tower property, 220 

Traffic intensity, 352 

Transient states, 214 


Transition probability, 212 
matrix, 212 
stationary, 212 

Type I error, 332 

Type II error, 332 


Unbiased estimator, 312 
Uncorrelated r.v.’s, 107 
Uniform, distribution, 60-61 
continuous, 60 
r.v., 60 
discontinuous, 61 
rv., 61 
Union of sets, 3 
Unit, impulse function (see Dirac 6 function) 
impulse sequence, 275 
sample response, 277 
sample sequence, 275 
step function, 171 
Universal set, | 


Variance, 55 
conditional, 107 

Vector mean, 111 

Venn diagram, 4 


Waiting time, 253 
White noise, 275, 292 
normal (or gaussian), 293 
Poisson, 293 
Wiener-Khinchin relations, 274 
Wiener process, 218, 256, 303 
standard, 218 
with drift coefficient, 219 


Z-transform, 154 


